igor-kupczynski-msc-put-thesis

Poznań University of Technology
Institute of Computing Science
Igor Kupczyński
Interactive Evolutionary Algorithm
for Robust Multi-Objective Optimization
Master’s Thesis
Supervisor: prof. dr hab. in˙z. Roman Slowiński
Poznań 2010

1
Writing the master’s thesis was a tremendous task, impossible to complete without the support
I have received. I would like to thank my supervisor, prof. Roman Slowiński, for his help, reviews
and encouragement.
I have spent one semester at Université Paris-Dauphine. The discussions with prof. Daniel
Vanderpooten and his colleagues from LAMSADE1
were inspiring and as horizon-widening as the
aura of the City of Light. I would like to thank him for the opportunity.
Finally, I would like to thank one person I cannot name, the dedicated and gifted linguist, for
her inestimable comments.
1The LAMSADE laboratory, established in 1974, is the joint laboratory of Université Paris-Dauphine and CNRS
(National Center for Scientific Research). Its central research activity lies at the interface of two fundamental
scientific areas: Computer Science and Decision Making.

Contents
1 Introduction 5
1.1 Goal and scope of the thesis . . . . . . . . . . . . . . . . . . . . 6
2 Multi-Objective Optimization 7
2.1 Interactive Approaches to MOO . . . . . . . . . . . . . . . . . . . 7
2.2 Evolutionary Approaches to MOO . . . . . . . . . . . . . . . . . . 8
2.3 Dominance-Based Rough Set Approach to MOO . . . . . . . . . . . . . 11
3 DARWIN, the Idea behind the Method 13
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Modeling of uncertainty . . . . . . . . . . . . . . . . . . . . . . 14
3.3 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Implementation of DARWIN 21
4.1 The issues of implementation . . . . . . . . . . . . . . . . . . . . 21
4.1.1 The programming environment. . . . . . . . . . . . . . . . . 21
4.1.2 The software prototype . . . . . . . . . . . . . . . . . . . 22
4.1.3 The final implementation . . . . . . . . . . . . . . . . . . . 22
4.2 Experiment framework . . . . . . . . . . . . . . . . . . . . . . 24
5 Results of Computational Experiments 29
5.1 The environment . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 The Decision Maker . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Problem selection . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 A deterministic case . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.1 Single run analysis . . . . . . . . . . . . . . . . . . . . . 33
5.4.2 The computational performance on exemplary problems . . . . . . . 45
5.4.3 The influence of parameters on the method . . . . . . . . . . . . 52
5.4.4 DM’s decisions inconsistent with supposed utility function . . . . . . . 52
5.4.5 A different rule-generating algorithm . . . . . . . . . . . . . . 60
5.4.6 The best results . . . . . . . . . . . . . . . . . . . . . . 63
5.5 The case of uncertainty . . . . . . . . . . . . . . . . . . . . . . 65
5.5.1 Single run analysis . . . . . . . . . . . . . . . . . . . . . 65
5.5.2 The computational performance on exemplary problems . . . . . . . 73
5.5.3 The influence of the basic parameters on the method. . . . . . . . . 76
5.5.4 DM’s decisions inconsistent with supposed utility function . . . . . . . 78
5.5.5 DomLem parameters . . . . . . . . . . . . . . . . . . . . 80
5.5.6 AllRules algorithm . . . . . . . . . . . . . . . . . . . . . 83

4 Contents
5.5.7 The second-order stochastic dominance. . . . . . . . . . . . . . 88
5.5.8 The best results . . . . . . . . . . . . . . . . . . . . . . 90
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6 Summary 97
Bibliography 99
Appendices 105
User’s manual 105
The DARWIN .mod ﬁle format 113
Compiling DARWIN from the source code 115

Chapter 1
Introduction
Optimization is a process of finding the best solution from a set of available alternatives. In
the simplest case it involves only a single objective. An objective is a problem’s goal, whose
value has to be maximized (or minimized). An optimization problem consists of a set of decision
variables; values of those variables affect the problem’s objectives. A domain of the problem —
a set of possible values that can be taken by the decision variables — may be subject to additional
constraints. An assignment of values to the decision variables is called a solution. A solution
is feasible if the values meet the constraints imposed on the problem. Therefore, the task of
optimization is to find a feasible solution, which maximizes (minimizes) the value of the problem’s
objectives.
Optimization is a field of applied mathematics, which has gained importance during and after
the Second World War. The driver for developments in the field are real-life, mainly military or
industrial problems. They are often large-scale and of high importance; solving a problem of this
kind usually yields a profit outranking the costs that one has to bear to employ a formal approach.
To use optimization methods one has to formulate a problem in a mathematical way — build
a model of the problem. Therefore, it is common to make a distinction between a stakeholder
— a decision maker (DM) having expert knowledge in a problem domain and an analyst —
a mathematician or a scientist that will create the model and choose an appropriate optimization
technique.
If there is only one, single objective in the problem definition, then the optimization usually
results in a one optimal solution which has the best possible value of the goal. However, this is
not the case when multiple goals are considered. Usually the optimal value for one goal is far
from being an optimal for the other; for example, consider a simple problem of choosing a laptop
computer to buy — the one with the highest performance is not the cheapest and the cheapest will
probably not offer a state-of-the-art performance. No trade-offs between objectives can be assumed
a priori. This is because the importance of each goal may be different for different decision makers
— someone can choose a cheap computer, while the other will opt for the highest performance
available. However, one can indicate a non-dominated set of solutions — the Pareto-frontier of
the problem. Informally, a solution is Pareto-efficient if a value of any of its criteria can not be
improved without worsening the other values. In the laptop example it could be a list of the
cheapest computers for different performances.
Multi-Objective Optimization (MOO) methods are usually sophisticated, with many parame-
ters for the analyst to fine-tune them. However, the decision maker is usually not interested in the
details of the method, but only in a single recommendation, possibly along with a justification. The
analyst will set up the method and its parameters based on his or her intuition and on a research
carried out earlier. An experienced analyst can usually decide what parameters’ values should be
used, but still it may be impossible to set the values precisely to the best possible options. If
one can change parameters a bit and the resulting solution is similar to the one acquired before,

6 1 Introduction
then the solution is called robust. The same applies if the problem model can not be formulated
precisely — for example, it contains a value that may be only estimated, like the future price
of a raw material. The solution should be resistant to small fluctuations in the problem’s model
parameters. The robustness in MOO context is the ability to withstand changes in the parameters
and in the problem formulation; it is a very important quality of any MOO technique.
To give final recommendation instead of the Pareto-frontier one has to engage the decision
maker in the process. The method has to be interactive in order to gather the DM’s preferences.
It can be done by showing exemplary feasible solutions and asking the decision maker to rank
them or simply by asking about the inter-criteria trade-offs. These preferences are used to guide
the search of the solution space in the directions desired by the DM. An optimization technique
involving interaction with the decision maker is called the interactive multi-objective optimization
(IMO) technique.
The algorithm is a well-defined list of instructions for completing a task. Several researchers
suggested that principles of the evolution — particularly, the concept of population and survival of
the fittest individuals can be a good model of operation for multi-criteria optimization algorithms.
Methods using these principles are called the evolutionary algorithms (EAs) while the whole field
of research is the evolutionary multi-objective optimization (EMO).
In this paper, the author presents a software implementation of the DARWIN method and
the first large computational experiment with DARWIN. DARWIN is an acronym for Dominance-
based rough set Approach to handling Robust Winning solutions in INteractive multi-objective op-
timization. It has been proposed by Salvatore Greco, Benedetto Matarazzo and Roman Slowiński
in [9, 13, 14, 15] for solving multi-objective optimization problems. It interacts with the deci-
sion maker in order to find the solution which best fits his or her preferences. The preferences
are represented in the form of decision rules. They are used for guiding the search of the most
preferred solutions in the solution space. An evolutionary algorithm is used as an optimization
engine. Therefore, DARWIN combines IMO and EMO paradigms. It allows an analyst to model
uncertainty in the problem definition, thus generating robust solutions.
1.1 Goal and scope of the thesis
The thesis consists of six chapters. Firstly, the theoretical background is presented. A detailed
description of the DARWIN method is presented in Chapter 3. Chapter 4 discusses an implemen-
tation on an IBM-PC class computer. Experiment results are shown and discussed in Chapter 5.
Finally, areas of further research are indicated along with conclusions and recommendations about
the method.
The goal of the thesis is to implement the DARWIN method, evaluate its performance on
a few MOO problems and test the influence of method’s parameters on the final result. Basic
recommendations for the analysts willing to use the method should be given. A user manual
describing how to use the implementation and what are the file formats used by the software has
to be attached to the thesis.

Chapter 2
Multi-Objective Optimization
Often a model of a problem contains multiple objectives. This is the case when conflicting goals
cannot be easily converted to a single objective. The decision support when more than one goal is
considered to be not an easy task to perform, however, several approaches exists. These approches
are described in the chapter.
2.1 Interactive Approaches to MOO
The Multi-Objectie Optimization problems usually have multiple Pareto-optimal solutions. How-
ever, the Decision Maker is usually interested in a single recommendation — the single solution he
or she may implement. The most-preferred solution is a solution from the Pareto-frontier of the
problem, for which the DM is convinced it is his or her best option.
To find the most preferred solution, an interaction with the DM is necessary. A problem solver
needs to know the Decision Maker’s preferences in order to differentiate Pareto-optimal solutions.
Without the preferences all solutions on the Pareto-frontier have to be considered equal.
The DM can build his or her global preference model before an algorithm solving the problem
starts and give this model as an input. This is call the a priori method. However, this method
has its weaknesses. It may be hard for the Decision Maker to give the full preference structure. It
is possible also, that he or she will change his or her preferences after evaluating solutions received
from the problem-solver.
The interactive approach overcome this weaknesses by involving the DM in the process. A basic
structure of the approach is shown in Figure 2.1. At first, an initial set of solutions is generated.
It can be a subset of the Pareto-frontier or just a set of feasible solutions. Then, based on the
solutions, the Decision Maker specifies his or her preferences. It can be done by a systematic
dialog, asking a series of questions or asking the DM to indicate “good” solutions among the set.
From the DM’s answers a preference model is built. This additional preference information
guides the search towards a region indicated by the Decision Maker. This can save the computa-
tional cost, because the algorithm doesn’t have to go through the whole search space.
Again, new solutions, probably better fitted to the DM’s preferences are generated and the
algorithm shows them to him or her. If he or she finds it satisfactory (or a stop condition is met)
then the algorithm stops. Otherwise it advances to the next iteration.
There are several types of the IMO methods (consult [26] for an in-depth description):
• Trade-off based methods. A trade-off is an exchange, a price that one is willing to pay
(in form of lost on some of the criteria), in order to benefit on another criterion (or criteria).
These methods ask the DM questions about the trade-offs he or she can accept and then,
a preference model is inferred based on the tread-offs.

8 2 Multi-Objective Optimization
Generate an initial set
of solutions
Show the solutions to
the decision maker
Is any of the
solutions
satisfatory?
[Yes]
Ask the DM to indicate
"good" solutions
Extract preference
information
Improve the solution
set
[No]
Figure 2.1: An activity diagram for a typical interactive process
• Reference point approaches — the DM specifies bounds on values of the objective func-
tions (i.e. reference points) and then, he or she can observe the effect of the bounds on the
generated solutions.
• Classification-based methods. It is not possible to improve a value of a goal of a solution
from the Pareto-frontier without worsening other goals of the solutions. In the classification-
based methods the DM is asked to select goals that can be impaired and the ones that he or
she wants to improve.
The interactive approach requires the Decision Maker’s collaboration during the process, how-
ever the approach offers strong benefits to justify this dedication. Clearly, the computational cost
required is lower than in other approaches, because there is no need to evaluate whole solution
space, just its small subset. The DM may not be able to express a global structure of his or her
preferences up front. It is also possible that his or her preferences will change along with the
change in understanding of the problem. During the interactive process the DM has an immediate
feedback — he or she may see how the decisions are affecting problem solutions.
One can say that solving a Multi-Objective Optimization problem is a constrictive process,
where the Decision Maker learns more about the problem — what kind of solutions are possible
and how his of her choices influences the results (see [26]). As a result, not only the most preferred
solution is given, but also the problem understanding by the DM is better.
2.2 Evolutionary Approaches to MOO
In 1859, Charles Darwin published his work “On the origin of species” [2]. He introduced a scientific
theory describing the evolution of species through the process called natural selection. According
to the theory, a trait can become less or more common in the population in dependence on its
effect upon the survival and reproduction of the individuals bearing the trait.

2.2 Evolutionary Approaches to MOO 9
P := Initialization
Termination
conditions are
met?
[Yes]
Evaluate(P) P' := Selection(P)
P'' := Variation(P')P := Elitism(P, P'')
[No]
Figure 2.2: An evolutionary optimization procedure
This idea can be easily transfered to the optimization field. A solution to a problem — that
is, the set of values of problem’s decision variables — is a single individual in the population.
The problem is the environment — the higher the solution evaluation on a given problem, the
better it is fitted to the environment. A better fitness means higher chance that the traits of
a solution will be present in the next iteration (an analogue of a reproduction success rate). First
successful applications of the idea were done in the electrical engineering field (see [6]) and in the
fluid mechanics (see [29, 30]).
The main differences between the classical and the evolutionary optimization (EO) are (see [3]):
• Population-based. An EO procedure uses a population of solutions (a population ap-
proach), whereas the classical algorithms maintain one solution at a time (a point approach).
It enables an algorithm to maintain multiple optimal solutions, possibly from different parts
of the solution space. Unfortunately it rises the memory and computational footprint of an
EO algorithm.
• Stochastic operators. An EO procedure uses stochastic operators (e.g. selection, crossover
or mutation) instead of deterministic ones.
• Gradient information. An EO procedure does not usually use gradient information directly
in performing a search. This means that the procedure is immune to local optima in a search
space. However, the EO procedure may not be competitive with dedicated gradient approach.
The basic evolutionary optimization procedure is shown in Figure 2.2. The algorithm starts
with creating a population of solutions. Usually the population is created at random within
bounds of decision variables. Then a succesion of generations starts. The populations is updated
by a sequence of operators.
First, the population is evaluated. The evaluation means establishing an relative preference
order, that is sorting solutions from the best to the worst. After the evaluation the algorithm
chooses solutions to fill the mating pool. The better the solution the higher the probability to be
chosen. Then, the variation operator is being used. It is a series of steps, such as crossover or
mutation, generating a succeeding generation (an offspring) from parents in the mating pool. The
crossover ensures that parents’ traits will be present in the next generation while the mutation
acts as local search in the solution’s neighborhood. Finally, the elitism operator combines the old
population with the newly created offspring. Coping best solutions from the former ensures the
algorithm has a monotonically non-degrading value of the best solution.

100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
Goal 1 (gain)
Goal2(gain) Pareto-frontier #1
Pareto-frontier
#2
Pareto-frontier#3
Pareto-frontier #4
The crowded areas
Individuals
Label
Figure 2.3: The NSGA-II evaluation
The choice of a fitness function is critical to the algorithm’s performance. In case of a problem
with a single criterion this is trivial — a value of the goal can be used. However, in the case
of the Multi-Objective Optimization, there are a number of objective functions to be optimized.
A possible approach to the problem is to use the dominance principle ([7]):
A solution x is said to dominate the other solution y, if both of the following conditions are
true:
1. The solution x is not worse than y on all objectives. Thus, the solutions are compared based
on their objective function values.
2. The solution x is strictly better than y on at least one objective.
All the solutions that are non-dominated by any other solution are forming the Pareto-frontier
of the problem.
According to [3] there are two ideal goals of the EMO:
1. Find a set of solutions which lies on the Pareto-optimal front, and
2. Find a set of solutions which is diverse enough to represent the entire range of the Pareto-
optimal front.
The most representative example of the Evolutionary Multi-objective Optimization (EMO)
algorithm is NSGA-II ([4]). The basic idea behind the algorithm is to assign each solution in
a population to a number of different Pareto-frontiers. All non-dominated individuals are assigned
to first Pareto-frontier and then removed from the population. All non-dominated individuals after
the removal are then assigned to second frontier. The process repeats until there are individuals in
the population. The lower the number of the frontier, which an individual belongs to, the higher
the fitness function for it. In case of a tie the crowding score is taken into account — the lesser the
crowd in the solution’s neighborhood in an objective space, the better the solution’s evaluation.
This is illustrated in Figure 2.3.

2.3 Dominance-Based Rough Set Approach to MOO 11
2.3 Dominance-Based Rough Set Approach to MOO
A rough set is an approximation of a conventional set — a pair of sets being a lower and an
upper approximation of the original set ([27]). The lower approximation is a set of all objects
that unambiguously can be classified as members of the target set. On the other hand, the upper
approximation is a set containing objects that cannot be unambiguously classified as members of
the complement of the target set. A boundary region is the part of solution space being part of
the upper approximation, but not the lower one.
Dominance-based Rough Set Approach (DRSA) is an extension of the rough set theory intro-
duced in [10, 11, 12]. The indiscernibility relation is replaced by the dominance relation (defined
in the former section). DRSA is applicable in the decision support field.
DRSA can model the situations in which a finite set of objects — vectors of values in the
decision variable space — has been classified to some decision classes, such that one object belongs
to exactly one class. The classes are preference ordered. The main task of DRSA is to structure
the classification into lower and upper approximations of unions of ordered decision classes, prior
to induction of monotonic decision rules, representing the preferences of an agent who made the
classification decision. An example of the DRSA data structuring is given in Figure 2.4.
The data in Dominance-based Rough Set Approach are often presented in a decision table. The
objects being considered are written in table rows, the decision attributes are table columns. The
last column is classification of the objects to a decision classes. A formal definition of the decision
table can be easily found in the literature. An example is given in Table 2.1.
On the basis of the table, decision rules may be induced. They are generalized description of
the knowledge represented in the table. A decision rule is a Horn clause (see [19]) in form of “if ...,
then ...”. The former part is called condition and the latter — consequent. The condition part
compares a value of an object attributes with given thresholds and the consequent part represents
the object classification if the condition part holds. Rules can be either certain — based on
objects from the lower approximation of the class, possible — based on objects from the upper
approximation and approximate — based on the boundary region. Each decision rule should be
minimal, i.e. cardinality of the set of conditions should be minimal.
Example rules generated from Table 2.1 are as follows:
1. If Literature ≥ good then Student ≥ good,
2. If Mathematics ≤ bad and Physic ≤ medium then Student ≤ bad,
3. If Mathematics ≥ medium then Student ≥ medium (possible).
DRSA can handle uncertainty and contradictions in the data, thus it can model a wide class
of real-world decision problems. For each rule r given in from Φ → Ψ, the following measures are
defined:
Student Mathematics Physics Literature Overall class
1 good medium bad bad
2 medium medium bad medium
3 medium medium medium medium
4 medium medium medium good
5 good medium good good
6 good good good good
7 bad medium medium bad
8 bad bad medium bad
Table 2.1: An example of the decision table

400 5 10 15 20 25 30 35
40
0
5
10
15
20
25
30
35
gain 2
gain1
Lower approximation of "High"
Upper approximation of "High"
High
Medium
Low
Figure 2.4: An example of the DRSA approach
• Support: supp(Φ, Ψ) = cardinality(||Φ ∧ Ψ||) — is the number of objects for which the
condition of the rule holds and the object classification is consistent with the consequent of
the rule.
• Confidence: confidence(Φ, Ψ) =
supp(Φ, Ψ)
cardinality(||Φ||)
— is the number of objects supporting the
rule in a relation to the number of objects for which the rule’s condition holds.
.
Objects supporting rule no. 3 are {S2, S3, S4, S5}, but the condition part holds also for S1.
The support is thus: supp(r3) = 4
5 = 0.8.
To induce all rules from the decision table one can use the All Rules algoritm (an optimized
version is described in [35]). Another option is to use the DomLem algorithm ([16]) generating
a minimal set of rules covering all the objects from a given table.

Chapter 3
DARWIN, the Idea behind the
Method
The basic idea of Darwin method was introduced in [14]. This idea will be described in the following
paragraph.
3.1 Background
The method combines two different approaches — Interactive Multi-Objective Optimization (IMO,
see 2.1) and Evolutionary Multi-Objective Optimization (EMO, see 2.2).
In the IMO paradigm one wants to elicit decision maker’s preferences by involving him or her
in the process. This is done by a systematic dialog with the decision maker (the DM). Questions
are being asked and the DM provides answers. Preference information is extracted on the basis of
these answers. Algorithm can then use the knowledge to produce solutions better fitted to his or
her preferences. The IMO framework is presented in Figure 2.1.
The rationale behind the interactive process is that the decision maker is interested only in
a small subset of preferred solutions or even in a single most preferred one.
This process makes it possible to gather preference information and then use this information
to construct better solutions. However, this is just a framework, so details are left up to the
analyst. One has to think particularly how to extract and store knowledge gathered on DM’s
answers and how to use this knowledge to generate and provide solutions better fitted to decision
maker’s preferences.
Human factor is yet another thing to consider. The DM is a human being and thus his or her
behavior may change. The challenge here is to find out what questions should be asked and how
often, as well as how many intermediate solutions should be presented to the DM for evaluation.
DARWIN is a realisation of the IMO process. It keeps generating solutions and improving them
on the basis of DM’s feedback. It only asks the DM to mark potentially good solutions, so that
only problem-domain knowledge is required; its user does not need to have expert knowledge in
the decision support field.
Evolutionary Multi-Objective Optimization (EMO) provides a computational engine for gener-
ating new, still better solutions in successive iterations — better in sense of an objective function
defined in the solution space. Most of the EMO methods are approximating Pareto-optimal front
by a set of solutions. So one solution is better than the other if the former Pareto-dominates the
latter. In case of two equivalent ones another factors have to be taken into account (for example
crowding score of NSGAII [4]). This is the case because if no preference information is given, all
Pareto-optimal solutions have to be considered equivalent.

14 3 DARWIN, the Idea behind the Method
100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
Goal 1 (gain)
Goal2(gain)
Pareto-frontier
An area prefered by the DM
Dominated solutions
Figure 3.1: Pareto-front and area preferred by the decision maker
It seems natural to combine these two described approaches — Interactive Multi-Objective
Optimization and Evolutionary Multi-Objective Optimization. IMO is just a process framework
but still needs an engine to generate and improve solutions. EMO is such an engine. On the
other hand for EMO involving the decision maker in the procedure results in gathering preference
information. This information allows the procedure to focus on a specific region of Pareto-front —
the most relevant one to the decision maker. In this way IMO and EMO are complementing each
other.
This is important because of the “human factor”. If the number of solutions becomes huge, the
DM can not effectively analyze them and find the one that fits his/her preferences best, thus Pareto-
optimality is not sufficiently discriminative. However, guiding the search to preferred regions of the
solution space allows the method to converge faster to good solutions. It is shown in Figure 3.1.
DARWIN uses EMO procedure to improve generated solutions based on the DM’s preferences.
3.2 Modeling of uncertainty
It is often the case that not all numbers and coefficients are precisely known. It may be easier
for the decision maker to formulate the Multi-Objective Optimization problem giving the problem
coefficients in the form of intervals of possible values. For example, instead of saying that product
price will equal 20 units one can say it will be in the [19, 21] interval. In this situation the decision
maker is often interested in finding the best robust solution — that is good and possible in a large
part of uncertainty scenarios. DARWIN allows to give the coefficients in a form of intervals.
The set of all of the problem’s coefficients given as intervals and fixed on one of possible values
will be called scenario of imprecision (see Figure 3.2). If intervals are allowed, it is impossible to
calculate the exact value of problem’s objectives for given solutions. To handle this case all the
considered solutions are evaluated over a set of uncertainty scenarios.

10-1 0 1 2 3 4 5 6 7 8 9
a
b
c
Scenario: {a ->2, b -> 6, c -> 4}
a = [1, 10]
b = [5, 10]
c = [0, 5]
Figure 3.2: Scenario — set of intervals ﬁxed on speciﬁc values
3.5-3.5 -3 -2 -1 0 1 2 3
0.1
0.2
0.3
0.4
25% 50% 75%
Figure 3.3: Percentiles of the normal distribution

Results of this evaluation are then aggregated to quantile space. The quantiles are points
taken at regular intervals from the cumulative distribution of a random variable. If the interval
consists of 0.01 of the distribution, then the quantiles are called percentiles (percentiles of the
normal distribution are shown in Figure 3.3). Instead of presenting all of the results, the method
calculates meaningful quantiles of results distribution for each objective. For example, percentiles
(like 1%, 25% and 50%) can be chosen. Percentiles divide ordered data into 100 of equally-sized
subsets — 1% percentile is the best of 1% worst solutions or alternatively the worst of 99% of the
best solutions.
Choice of these quantiles is connected with the DM’s attitude towards risk. If he or she wants to
avoid risk then his/her decision will be focused on quantiles from the beginning of the distribution
(e.g. 10%). On the other hand, when he or she is interested in the best possible solution even if
there is a risk involved, then the quantiles from the end of the distribution will be inspected (e.g.
75%).
In DARWIN the DM’s preferences are gathered and stored using DRSA methodology (see 2.3).
Dominance-Based Rough Set Approach is a framework for reasoning about partially inconsistent
preference data. DRSA already has successful applications in IMO area (see [10, 11, 12]).
DRSA will be applied in IMO process. After selecting “good” solutions from the provided set
the decision rules are induced to store the preferences. These rules are given in the form of “If ...
then ...”. Conditional part is a disjunction of conditions on attributes from quantile space. These
attributes are compared to specific values, e.g. profit25% >= 100 ∧ time1% <= 10. The consequent
part assigns a solution to a class (at least or at most), e.g. Class at least Good. So the whole rule
would be If profit25% >= 100 ∧ time1% <= 10 then Class at least Good.
3.3 The algorithm
DARWIN can operate on Multi-Objective Optimization (MOO) problems defined as follows:
[f1(x), f2(x), . . . , fk(x)] → max (3.1)
subject to:
g1(x) ≥ b1
g2(x) ≥ b2
. . .
gm(x) ≥ bm
(3.2)
Where:
x = [x1, x2, . . . , xn] is a vector of decision variables, called a solution;
f1(x), f2(x), . . . , fk(x) are objective functions, f : x → R;
g1(x), g2(x), . . . , gm(x) are constraint functions, f : x → R;
b1, b2, . . . , bm are real-valued right hand sides of the constraints.
It is possible to give some of the coefficients in objective functions or constraints in the form
of intervals (thus modeling ignorance — uncertainty about real value of a coefficient). A vector of
fixed values for each interval is called scenario of imprecision (see Figure 3.2).
DARWIN is composed of two nested loops — exterior and interior. The former is a realisation
of an interactive process and the latter is an EMO engine dedicated to improve solutions based on
the decision maker’s preferences. This is illustrated in Figure 3.4.
The exterior loop algorithm, corresponding to the IMO interactive process is shown on alg. 1.
A More detailed description of each step follows.
First, one has to generate a set of feasible solutions to the MOO problem first. This can be
done using the Monte Carlo method. The Monte Carlo concept itself is not new. A concept of

of solutions
Show the solutions to
the decision maker
Is any of the
solutions
satisfactory?
[Yes]
[No]
of scenarios
Ask the DM to indicate
a subset of "good"
solutions
Generate rules
Perform an
evolutionary
optimization
Figure 3.4: Activity diagram for DARWIN. “Perform an evolutionary optimization” step is
forming the interior loop
Algorithm 1 DARWIN’s exterior loop
1: X ← GenerateSolutions
2: S ← GenerateScenarios
3: loop
4: for all x ∈ X, s ∈ S do Evaluate each solution over all of the scenarios
5: Evaluate(x, s)
6: end for
7: isSatisﬁed ← PresentResults
8: if isSatisﬁed then
9: stop
10: else
11: X ← AskToMarkGood(X)
12: end if
13: rules ← InduceDecisionRules(X)
14: X ← EmoProcedure(rules) the interior loop
15: end loop

statistical sampling became popular after digital computing machines had been invented (see [25]).
This method can be described as random sampling a domain of a problem. In the most basic
variant one can just pick a solution at random and check if it is feasible. Unfortunately, it will be
impossible unless the non-feasible space is only a small part of the domain. Additional hints for
the generator, for example in the form of analyst’s suggestions, can be taken into account.
At this stage goals and constraints are allowed to contain intervals corresponding to the un-
certainty of a model. Thus a set of scenarios needs to be generated. Each of these scenarios is
a realisation of the problem with fixed values of the intervals. It is worth noting that if the problem
constraints are given in an uncertain form — that is containing coefficients in the form of intervals
— it could be impossible to determine whether a given solution is possible. If that is the case, then
lines 1 and 2 should be swapped and feasibility of a solution set should be checked on generated
scenarios.
In lines 4 to 6 each solution if evaluated over each scenario. Results of this evaluation phase
are then gathered and presented to the decision maker in 7. The DM is a human being though,
so in order to get valuable feedback one need to show the data in aggregated form. The au-
thors proposed a meaningful quantiles to be presented. For example f1%
1 (x), f25%
1 (x), f50%
1 (x),
. . . , f1%
k (x), f25%
k (x), f50%
k (x) for all x ∈ X.
If the DM finds solution in the set of presented ones satisfactorily, then the problem is solved
and algorithm ends here. If not, however, he or she is asked to indicated the “good” solutions
in the set (line 11). On the basis of this distinction, the method generates a set of decision rules
(line 13). These rules are then passed to the interior loop (line 14) where — by using the EMO
paradigm (see 2.2) — DARWIN performs search of the solution space. The search is driven towards
a specific region on the basis of the rules. Finally new solutions, better fitted to DM’s expectations
are generated and the process starts over again.
Algorithm 2 shows the interior loop of DARWIN method. This loop is an EMO procedure
guided by the decision rules induced in exterior loop on DM’s selections.
Algorithm 2 DARWIN’s interior loop
1: procedure EmoProcedure(rules)
2: X ← GenerateSolutions
3: S ← GenerateScenarios
4: loop
5: for all x ∈ X, s ∈ S do this loop calculates meaningful quantiles for each solution
6: Evaluate(x, s)
7: end for
8: if termination conditions fulfilled then
9: return X
10: end if
11: pScore ← CalculatePrimaryScore(X)
12: sScore ← CalculateSecondaryScore(X)
13: X ← RankSolutions(X, pScore, sScore)
14: P ← SelectParents(X )
15: O ← RecombineOffspring(P)
16: O ← Mutate(O)
17: X ← MergePopulations(X , O )
18: end loop
19: end procedure
The procedure starts by generating a new set of feasible solutions and possible scenarios in
lines 2, 3. Then in line 4 actual evolutionary optimization starts.
Each solution is an individual. The solution set constitutes a population. Iterations of a loop
defined in line 4 mark generations of the population. Termination condition could be for example
a fixed number of iterations or a fixed amount of time.

3.3 The algorithm 19
First, in every generation the population is evaluated and ranked. The process starts with
evaluation of each solution over every scenario (5 – 7). After the evaluation meaningful quantiles
are known for each solution. Then the procedure can calculate a primary score for each of the
individuals. The primary score is computed as follows. Let:
rules(x) = {ruleh ∈ rules : ruleh is matched by solution x}
rules(x) is a set of rules (ruleh ∈ rules) matched by solution x ∈ X.
X(ruleh) = {x ∈ X : x is matching ruleh}
For each ruleh ∈ rules : X(ruleh) is a set of solutions matching this rule.
w(ruleh) = (1 − δ)card(X(ruleh))
Each rule (ruleh) gets a weight related to the number of times it is matched by a solution.
δ is a decay of rule weight. For example δ = 0.1. This formula associates higher weight for
rules matching lesser number of solutions — this is an important property because it allows
to maintain diversity with respect to rules.
PrimaryScore(x) = ruleh∈rules(x) w(ruleh)
Finally PrimaryScore(x) is a primary score of a given solution (x ∈ X).
In case of a draw also a secondary score is considered for each solution. This score is calculated
similarly to a crowding distance score in NSGA-II method ([4]). The diﬀerence lies in the fact,
that this score is calculated in a quantile space instead of original objective space, e.g. f1%
1 ×
f25%
1 × f50%
1 × · · · × f1%
k × f25%
k × f50%
k . The procedure is shown in alg. 3.
Algorithm 3 Procedure calculating crowding distance
1: procedure CalculateCrowdingDistance(X)
2: n ← |X| Number of solutions
3: for all x ∈ X do Initialize
4: distance(x) = 0
5: end for
6: for all o ∈ objectives do
7: X ← Sort(X, o) Sort solutions using value of an o objective
8: distance(X‘(1)) ← ∞ Boundary solutions get highest score possible
9: distance(X‘(n)) ← ∞ X‘(n) is the n-th element of the X‘ ordered set
10: end for
11: for i ← 2, n − 2 do o(x), x ∈ X is a value of the objective o in the solution s
12: distance(X‘(i)) ← distance(X‘(i)) +[o(X‘(i − 1)) + o(X‘(i − 1))]
13: end for
14: end procedure
In line 14 parents selection is done. The process is a Monte Carlo procedure; possibility of
selecting a solution x ∈ X as a parent is:
Pr(x) =
|X| − rank(x) + 1
|X|
γ
−
|X| − rank(x)
|X|
γ
where rank(x) is a rank of a solution x ∈ X in the ranking made in line 13. γ ≥ 1 is an elitism
coeﬃcient. The higher the γ, the bigger the probability of selecting a high-ranked solution as
a parent.
In line 15 a new individual (a child) is created using two of the parents chosen in the previous
step — a ∈ P, b ∈ P. The child is obtained by combining the parents together:
z = λa + (1 − λ)b
λ is a random real-valued number; 0 ≤ λ ≤ 1.

Mutation operator is applied to the oﬀspring population. Probability of mutation for a single
individual is decreasing in successive generations and can be calculated using the formula:
Pr(t) = (1 − ω)t−1
Where t is a number of current iteration, ω - mutation decay rate, - initial mutation probability.
Suggested values are ω = 0.1, = 0.5.
In this section, the DARWIN idea was explained. Its implementation is described in the next
chapter.

Chapter 4
Implementation of DARWIN
DARWIN is a high-level description of an algorithm that can be used to solve a multi-objective
optimization problem. It is a list of steps to execute and calculations to perform. Therefore,
one does not need a computer to realize the decision-making process. However, in practice it is
almost impossible to complete all the steps without a dedicated software application. Moreover,
to evaluate the method’s performance one needs to repeat the experiments several times.
For the reasons stated above it was required to implement the method as a computer pro-
gram. This chapter describes technologies used for the DARWIN’s implementation, as well as an
experiment framework development.
4.1 The issues of implementation
4.1.1 The programming environment
The environment in which DARWIN has to function is not empty; thus it has to be taken into
account during the development of the implementation. DARWIN is a realization of an interactive
process, therefore a way of communication with the decision maker is required. Another restriction
is imposed by the need of inducing decision rules from the DM’s evaluated examples of solutions.
The decision maker has to provide a problem he or she wants to solve. It can be done in a model
file. If one wants to change the default parameters’ values, then a configuration file with the values
is also required. Moreover, during the algorithm’s run, presence of the decision maker is needed
in order to select “good” solutions from the provided ones. Consult the user manual (6) for more
details on the file formats. The parameters are described later in this section.
Decision rules store the DM’s preferences, therefore they are a symbolic representation of trade-
offs he or she is willing to make, as well as the importance of each criterion. They ensure robustness
of the resulting solutions because of an underlying DRSA framework. The rules are a very im-
portant part of the method. Thus, a way — an algorithm — to generate them is needed. In the
method’s description provided in [13, 15, 9], a phase of obtaining the rules is treated as a black
box. It is assumed that a component able to generate rules from the DM’s selection exists. Details
are omitted — it is up to an analyst (or to a developer of an implementation) to use a software
component he or she finds feasible.
The implementation of the DRSA framework, being able to induce decision rules from a given set
of examples is a complex task, both in terms of possible technical challenges as well as conceptual
and scientific work that needs to be carried out. Such a software application deserves a paper
of its own. Therefore, the author decided to use an existing implementation in order to focus
on the DARWIN method. Java Rough Set (jRS) — a project carried out at the Laboratory of
Intelligent Decision Support Systems at Poznań University of Technology that aims at providing

22 4 Implementation of DARWIN
DRSA framework implementation in Java — was chosen. It contains the DomLem algorithm [16]
able to generate a minimal set of decision rules from the examples given as its input.
jRS is written in Java programming language1
and runs on top of the Java Virtual Machine
(JVM [23]). The JVM is a portable platform capable of running the Java bytecode. Having a Java
component (jRS in this case) effectively put a constraint on the DARWIN implementation — it is
required that the implementation is also delivered as a JVM application. However, the technical
details of the Java platform are out-of-scope of this paper, so they are omitted unless relevant for
the DARWIN’s description.
Unfortunately, generating only a minimal set of rules may not be sufficient. The primary score
in the interior loop is based on a number of rules matching a given solution. Therefore, having
more rules may prevent a gradient loss in the evolutionary optimization. To check if this effect has
serious impact on the performance, another rule-generating algorithm was developed. The author
implemented AllRules algorithm [17] in Scala programming language. Again, the algorithm’s
details are out of the scope of this paper. Nevertheless, it is worth noting that the number of rules
generated by AllRules is greater than the one generated by DomLem.
4.1.2 The software prototype
As his first task, the author chose to implement a prototype of the program. The goal was to check
the behavior of the method — if it works, converges to the preferred region, is able to generate
a reasonable solution and to withstand the uncertainty in the decision maker’s preferences; to find
out if the overview given in [13, 15, 9] is accurate enough to develop a working software. More
important however, was to identify potential problems and to gain a deep knowledge of the method.
The prototype was developed in Java. Java is a very popular language and de facto an industry
standard in many fields, especially in enterprise applications. The application itself is able to solve
a subset of the MMO problems. It supports uncertainty and it simulates the decision maker. It is
a working implementation of the method. The goal was achieved — to develop a working piece of
software showing that DARWIN is able to solve multi-objective optimization problems.
Unfortunately, the prototype has not been fully featured. It has contained only a simulated
DM and lacked any user interface at all. The problem to solve and the DARWIN parameters are
hard-coded for the prototype. Nevertheless, the code was being developed in an extensible way
and contained a suite of unit tests. However, it is a common practice in the software engineering
to discard the prototype and start developing the final application from scratch. This was the
case with DARWIN — the author’s knowledge of the method changed during the prototyping and
starting from the beginning has been considered a better option.
4.1.3 The final implementation
The Java language is stable, well-tested and features numerous libraries dedicated to almost every
possible applications; unfortunately, it is also very verbose and lacks support for many of the recent
trends in the computer programming and software development. It is a great tool for large teams,
as it is usually the case in the enterprise application market. However, what is a good solution for
a team is not essentially the best possible tool for a single developer working on a project.
The author decided to develop the implementation in Scala2
. Scala is a multi-paradigm pro-
gramming language combining object-oriented features with functional programming. The former
paradigm is a dominant contemporary software development practice; it allows to program and
model problem domain’s objects as well as any relations between them. However, the latter makes
it possible to write a concise code dealing with mathematical computations. This is a very useful
trait in scientific applications.
1Developer Resources for Java Technology — http://www.oracle.com/technetwork/java/index.html
2The Scala Programming Language — http://www.scala-lang.org/

4.1 The issues of implementation 23
Scala runs on JVM, therefore it is compatible with existing Java libraries. The Scala compiler
produces the Java bytecode from a Scala source code. The bytecode is nearly identical to the one
generated from a Java source. The only difference is that Scala programs require the presence of
an additional run-time library — scala-library.jar. The language was designed by Martin Odersky
and started as a project in École Polytechnique Fédérale de Lausanne (EPFL). Because of the
functional elements and different syntax Scala programs are usually shorter and easier to read
than their Java counterparts, especially when dealing with mathematical operations.
The DARWIN code is divided into packages. Their structure is shown in Figure 4.1. Their de-
scription follows. pl.poznan.put.darwin is a root package following Java naming conventions [21].
For readability its name is omitted in the following description. Packages:
evolution — a module containing classes responsible for the evolution of solutions. It con-
tains a controller driving the process, an engine of the evolutionary algorithm, evolutionary
operators (mutation and cross-over) and a parent selection strategy.
jrsintegration — an abstraction layer over the DRSA framework. This module contains classes
responsible for integration with jRS and DomLem algorithm. They are able to convert
DARWIN’s solutions to a format required by the jRS, invoke the DomLem algorithm and
later — during the interior loop — check if a given solution matches any of the generated
rules.
model — models a hierarchy of objects for an abstract MOO problem. Contains basic classes
required to represent a problem, its variables, goals and constraints, as well as classes rep-
resenting a solution and a scenario of uncertainty. A problem parser — for reading problem
instances from files, a solution evaluator — computing a value of solution against given sce-
narios and a configuration reader — setting DARWIN’s options on the basis of a file, are also
components of the model package.
simulation — a package responsible for connecting all the other parts together. It initializes
the process by asking model to read the problem and the configuration; takes care of the
interaction with the DM, invokes jrsintegration to process the preferences and to obtain
the rules and finally interacts with the evolution controller from the evolution package. It
can also simulate the decision maker and generate reports if needed for automated experiment
framework.
utils — additional utility classes for the other modules.
In order to reduce the number of bugs and eliminate recurring of the fixed bugs an extensive
allrules darwin
evolution gui jrsintegration model
problem solution
simulation utils
pl.poznan.put
Figure 4.1: The package structure of DARWIN

test suite accompanies each of the packages. In order to streamline the building process the Scala
Build Tool (SBT, see [18]) is used to manage dependencies, to take care of running the test suite
and to build the application from its source code. The code contains javadoc3
on most of the API
methods and a lot of comments in places that could be particularly hard or tricky to understand.
Therefore it should be possible for the other developer to use and extend the code delivered as
a part of the thesis. A manual for the end-user is enclosed at the end of the paper.
A lot of DARWIN parameters are mentioned throughout the paper. They are gathered and
described in Table 4.1. The parameters can be set in a configuration file. The options are split
into sections and listed in the table. The syntax of the configuration file is described in the user
manual (6).
4.2 Experiment framework
One of the goals of the thesis was to evaluate the DARWIN’s performance and the importance
of the parameters. To do this a set of tools was developed. Firstly, DARWIN can be instructed
to log details of an execution to a file (see Table 4.1). These logs consist of two reports — the
evolution report and the DM’s decisions report. The former contains the information about the
population’s shape and the latter about the decisions made by the decision maker. The reports
are in the comma separated values format (CSV [31]).
The algorithm is interactive which means that constant human supervision is required. How-
ever, one needs to repeat the experiments over and over again. For this reason the decision maker
has been simulated. It is possible to pass a utility function for a supposed decision maker and
then, the algorithm will make a decision on its own on the supposed utility basis. This automates
a single run of DARWIN, however an operator is still required to start the program over and, if
necessary, to change the execution parameters.
To bypass this limit and streamline the process, the experiment framework is provided. One
can define a base configuration along with a test plan and then a computer will proceed with an
execution of the test plan. Scripts composing the framework are written in Python — a general,
high-level programming language4
and glued together using the Bash shell scripts5
. A description
on how to use the framework is provided as a part of the user manual 6.
A test plan execution will typically result in a number of report files full of the runs’ details.
It would be a tremendous task to analyze them by hand. However, presenting this data in an
aggregated form as a set of charts can immensely simplify the analysis.
R is an environment for statistical computing6
. The program along with ggplot library (see [32])
has been used for the automatic chart generation. The R scripts can import the CSV reports,
aggregate and process the data and finally generate a chart.
First, a user prepares a test plan and sends it to the framework. The framework is repeatedly
running DARWIN with a simulated decision maker. Resulting reports are then processed by the
R scripts. As a result charts are generated.
It is possible to generate the following type of charts (they are shown in Figure 4.2 and 4.3).
• A change of the supposed utility value during the exterior loop iterations. Example shown
in Figure 4.2a. This is a basic chart for the analysis, because it shows a high-level overview
of the run from the decision maker’s perspective.
• A change in a population shape during the interior loop run. May show a convergence of
a population to a specific region. Figure 4.2b.
3Javadoc Tool — http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html
4The Python Programming Language — http://python.org/
5Bourne-Again SHell — http://www.gnu.org/software/bash/bash.html
6The R Project for Statistical Computing — http://www.r-project.org/

Section Name Description Default value
Main
Solution count The number of solutions in a population. 30
Scenario count The number of scenarios on which the solu-
tions will be evaluated.
30
Generation count The number of generations in the interior loop. 30
Gamma (γ) The coefficient of elitism. The higher γ, the
higher the probability of choosing a solution
with a higher rank as a parent.
2.0
Delta (δ) The decay of rule weight (see 3.3). 0.1
Eta (η) The initial mutation probability. 0.5
Omega (ω) The decay rate of the mutation probability. 0.1
Outer count The number of exterior loop iterations to be
simulated.
20
Use average Whether an average in quantiles should be
used instead of the maximum value).
false
Mutation tries The number of mutation tries. For some prob-
lems it may be very hard to mutate a solution
to get the other feasible one. Try no more
times than defined here.
20
Use “at most” Whether an “at most” rules should be consid-
ered along with “at least”.
false
Percentiles Which percentiles are meaningful to the deci-
sion maker.
1.0, 25.0, 50.0
Multi rules Should the rules be generated multiple times
(see 5.5.5).
false
Multi rules count How many iterations of rules generation
should be performed.
3
Algorithm
AllRules Should the AllRules algorithm be used instead
the DomLem
false
DomLem confidence level A level of confidence that the rules gener-
ated by the DomLem algorithm should at least
have.
1.0
SimulatedDM
Base good count The number of solutions to be marked as good
by the simulated DM.
3
Good count delta A parameter for simulating inconsistencies in
DM’s decisions. See 5.4.4.
2
Noise level A parameter for inconsistencies noise in DM’s
decisions.
0
Evolution
Regenerate every During the interior loop the solution set is
evaluated against a scenario set. This param-
eter specifies how often the scenario set should
be changed — in how many generations a re-
generation of the set should occur.
1000
Regenerate percent Part of the scenario set to regenerate. 0.0
Use supposed utility If the evolutionary algorithm should use the
supposed utility instead of a rule-based score.
false
Reports
Evolution report If the set specifies where an evolutionary re-
port should be written.
evrep.csv
DM Report If the set specifies where the DM’s decision
should be reported.
dmrep.csv
Brief report If a brief report should be written to standard
output. Useful in supervising a batch run.
true
Rules directory The directory, where rules generated during
the run should be saved. Do not save the rules
if empty
rules
Table 4.1: The parameters of DARWIN

• A change of the utility value along with the primary score during the interior loop — a run
of the evolutionary algorithm. Figure 4.2c.
• Shows the change in a population of an evolutionary algorithm from the utility and the
primary score perspective. Figure 4.2b.
• A comparison chart in 10th exterior interior— to present the utility value change of multiple
tests (possible with diﬀerent parameters) side-by-side. Each test consists of one or more runs.
Figure 4.3a.
• A comparison chart — to compare the results of several tests. Each test consists of several
runs. The results are charted on a box plot, so the analyst can easily compare a median, the
ﬁrst and the third quartile and outliers of each test. Figure 4.3b.
• A chart showing decision maker’s selections. The whole population is shown and selected
individuals are marked. Figure 4.3c.
The framework is just a utility, but still an important one to evaluate the performance of the
method. The implementation presented in this chapter was a basis for the DARWIN method’s
evaluation. The results are given in the next chapter.

Supposed Utility
exterior_iteration
value
−140
−130
−120
−110
−100
q
q
q
q
q
q q
q q
q
q
q q
q
q q
q
q
q
q q
q q q q q q
q q q q q
q q q q q q
5 10 15
result_value
q min
q max
(a) Exterior loop iterations → the supposed utility value
profit_50.0 x time_50.0, ext.iter.=1
profit_50.0
time_50.0
100
150
200
250
300
350
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
200 400 600 800 1000
generation
q 0
q 10
q 20
q 30
(b) The population shape change in the objective space
UtilGen, ext.iter.=1
generation
value
2200
2400
2600
2800
q q
q
q
q
q q q q
q q
q q
q
q
q
q
q q q q q q
q
q q
q q q q q
q q q
q q
q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q q
q
q
q q
q q q
q q q q q q
q q q q q q
q
q
q q
q q
q
q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
PrimaryScoreGen, ext.iter.=1
generation
value
0.0
0.2
0.4
0.6
q
q
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q q
q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
(c) Interior loop generations → the utility value, the primary
score
Utility/Individual, ext.iter.=1
individual
utility
2000
3000
4000
5000
6000
q
q
q
q
q q
q
q q q q
q
q q
q q
q
q q
q
q q q q q
q q q
q
q
q
q
q
q
q q q
q
q q q q q q
q q q
q q
q q
q q
q q
q q q
q q
q
q q q q q
q
q q
q
q q
q q q
q q q
q
q q
q
q
q
q q
q q q
q
q
q
q
q q q q
q q
q
q q
q
q q q q q
q
q q
q q q
q q q q q q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
Primary/Individual, ext.iter.=1
individual
primary
−0.2
−0.1
0.0
0.1
0.2
q q q q q q q q q q q q q q q
q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
(d) The change of the utility and the primary score in a pop-
ulation
Figure 4.2: Type of charts available in the experiment framework

Test summary
exterior_iteration
utility_mean
6450
6500
6550
6600
6650
6700
q
q
q
q
q q q
q
q q
q
q
q
q
q q q q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
2 4 6 8 10
test
q base
q generation45
q generation60
q scenario45
q scenario60
q solution45
q solution60
(a) The comparison of test runs
Test summary, ext.iter.=10
test
utility
6600
6650
6700
6750
q
q
q
base
generation45
generation60
scenario45
scenario60
solution45
solution60
(b) The comparison of test runs
DM Choice, ext.iter.=0
profit_50.0
time_50.0
100
150
200
250
300
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
200 400 600 800 1000
good
q
0
q 1
(c) The decision maker choices
Figure 4.3: Type of charts available in the experiment framework

Chapter 5
Results of Computational
Experiments
DARWIN is a new method and according to the author’s knowledge it hasn’t been implemented
before. Experiments are designed to check if the method is working at all, what parameters are
important for the method and what should be their reasonable default values. To make the results
repeatable the DM is simulated. Inconsistencies in his or her decisions are simulated. Unless
uncertainty is involved, comparisons to the exact optimal solution are provided. In tests involving
uncertainty results are compared to supposed utility function optimization.
5.1 The environment
All tests were conducted on a personal computer with 64bit Intel processor. RAM size on the
machine is 3GB. 64bit Linux operating system was used. The Java Virtual Machine was in version
1.6.0 18 and Scala 2.8.0. JVM was run with options -Xms768m -Xmx768m thus setting memory
available for application to 768MB. Tests were performed through CLI batch interface.
Test framework is available in order to automate the experiment process. All experiments were
repeated at least thirteen times. Data analysis and chart generation were performed using an R
environment1
. The framework is a combination of the Python2
and Bash3
code communicating
with main DARWIN code and with the modules written in R.
5.2 The Decision Maker
Experiments were repeated many times. In order to make it possible it was essential to simulate
the DM. Simulating the decision maker guarantees repeatable process across the test runs.
In the DARWIN method interaction with the decision maker occurs by showing him or her
a list of generated solutions (usually 30) and asking him or her to indicate a few “good” ones.
To simulate the DM one needs to simulate his or her selections. It is assumed that the decision
maker acts according to an utility function he or her has in mind. This function will be called the
supposed utility function.
In the simulating process algorithm sorts the received solutions list according to this supposed
utility function. Then it selects three (usually three, however this can be conﬁgured) solutions
with the highest supposed utility.
1The R Project for Statistical Computing — http://www.r-project.org/
2The Python Programming Language — http://python.org/
3Bourne-Again SHell — http://www.gnu.org/software/bash/bash.html

30 5 Results of Computational Experiments
Unfortunately real decision maker, being a human isn’t as predictable and repeatable as the
described process. In the sections 5.4.4 and 5.5.4 the results of introducing inconsistencies to DM’s
decisions are presented.
In the following problems the supposed utility function is also defined.
5.3 Problem selection
The area of interest for Multi-Objective Optimization is huge and consists of many potential prob-
lems to be solved. There are multi-criteria versions of classical problems, like minimum spanning
tree [8], traveling salesman problem (TSP) [1] or knapsack problem [28] as well as an artificially
generated ones — like the DTLZ problem [5] Some of them are interesting because of their real-life
applications while the others are good for experimenting and testing purposes.
It is worth noting that the ordinary single-criterion versions of the problems can be easy to
solve. However, in multi-criteria settings one has to infer the decision maker’s preferences and
approximate the supposed utility function correctly. The challenge here is not to build the best
optimization algorithm for all the problems (this is impossible according to no-free-lunch theo-
rem [34]) but rather a framework for preference information extraction.
The experiments were performed using following problems:
Two-criteria binary knapsack problem
¯x = [x1, x2, . . . , x300]
xi ∈ {0, 1}; i = 1, 2, . . . , 300
max value1: ¯a1 · ¯x
subject to:
weight: ¯w · ¯x ≤ b
(max) supposed utility:
1 ∗ value1 + 2 ∗ value2
Where ¯x is a vector of items to be chosen. The problem is binary, so each xi ∈ ¯x can be
either selected (xi = 1) or not(xi = 0). There are two-criteria: value1 and value2. Each one
is a sum of items multiplied by associated weights (vector ¯a1 and ¯a2).
Knapsack constraint is given. One can choose items up to a certain weight (b). There is
a vector of weights associated with each item ( ¯w). The limit is defined that it is possible to
choose about 2/3 of the items.
Weights ( ¯a1, ¯a2, ¯w) are uniformly distributed vectors of values in [0, 10) interval.
Two-criteria continuous knapsack problem
¯x = [x1, x2, . . . , x300]
xi ∈ [0, 1); i = 1, 2, . . . , 300
subject to:
3 ∗ value1 − 1 ∗ value2

5.3 Problem selection 31
Continuous version of the knapsack problem. Description given for the binary version also
applies here. The only diﬀerence is that now the items can be partially selected (∀xi∈¯xxi ∈
[0, 1)).
Three-criteria binary knapsack problem
¯x = [x1, x2, . . . , x300]
xi ∈ {0, 1}; i = 1, 2, . . . , 300
subject to:
1 ∗ value1 − 1 ∗ value2 + 2 ∗ value22
Two-criteria problems can be easily visualized and analyzed. However, in real-life applications
there is often a need for three or more criteria. There is a leap in moving from two- to
multiple-criteria, so it is worth comparing the results achieved on the three-criteria knapsack
problem with its two-criteria counterpart.
Three-criteria DTLZ problem generated using constraint surface approach
min f1: x1
min f2: x2
min f3: x3
subject to:
0 ≤ xi ≤ 1, i = 1, 2, 3
− x1 + x2 + 0.6 ≥ 0
x1 + x3 − 0.5 ≥ 0
x1 + x2 + x3 − 1.1 ≥ 0
− 1 ∗ f1 − 2 ∗ f2 − 1 ∗ f3
This problem consists of three simple linear criteria. The solution space is a three-dimensional
cube bounded by the 0 ≤ xi ≤ 1 constraint. To make the problem challenging parts of the
solution space are being cut oﬀ by additional constraints.
This problem was build according to constraint surface approach presented in [5].

Two-criteria robust mix problem
max profit: pAmin(xA, dA) + pBmin(xB, dB) + pCmin(xC, dC)
− (r1
AxA + r1
BxB + r1
CxC)p1
R − (r2
AxA + r2
BxB + r2
CxC)p2
R
min time: tAxA + tBxB + tCxC
where:
pA ∈ [20, 24], pB ∈ [30, 36], pC ∈ [25, 30]
dA ∈ [10, 12], dB ∈ [20, 24], dC ∈ [10, 12]
r1
A ∈ [1, 1.2], r1
B ∈ [2, 2.4], r1
C ∈ [0.75, 0.9]
r2
A ∈ [0, 5, 0.6], r2
B ∈ [1, 1.2], r2
C ∈ [0.5, 0.6]
p1
R ∈ [6, 7.2], p2
R ∈ [9, 9.6]
tA ∈ [5, 6], tB ∈ [8, 9.6], tC ∈ [10, 12]
subject to:
0 ≤ xA ≤ 12
0 ≤ xB ≤ 24
0 ≤ xC ≤ 12
profit1%
+ 3 ∗ profit25%
+ 2 ∗ profit50%
− time1%
− 3 ∗ time25%
− 2 ∗ time50%
The problem was described in a presentation DARWIN: Dominance-based rough set Approach
to handling Robust Winning solutions in INteractive multiobjective optimization given at 5th
International Workshop on Preferences and Decisions in Trento, 2009) describing the DAR-
WIN method. It contains a lot of coefficients given in the form of intervals. For readability’s
sake they were named and defined below the criteria.
The goal is to set quantity of each product (A, B, C) to be produced. One wants to maximize
the profit and minimized the total time it takes to produce the products. pi is the price of
a product i ∈ {A, B, C} on the market. There is also maximal demand the market can
consume (di). Each product consists of two raw materials — r1 and r2. Quantity needed
to produce i-th product is defined (r1
i , r2
i ) as well as the product price (p1
R, p2
R). Finally, it
takes time to produce a given product — tA, tB, tC.
Coefficients are given in the form of intervals, so each solution has to be evaluated on many
scenarios of uncertainty. This is why no exact values are used in the supposed utility function
— one can not do it because there are no exact values, only a series of evaluation results.
Percentiles are used instead. goal25%
means a result of the best evaluation among the worst
25% of evaluations.

5.4 A deterministic case 33
Four-criteria robust DTLZ7 problem
min fj(x): 0.1 ∗
10j
10(j−1)+1
xi + [0, 2 ∗ (4 − j)], j = 1, 2, 3, 4
subject to:
g1(x): f4(x) + 4f1(x) − 1 ≥ 0
g2(x): f4(x) + 4f2(x) − 1 ≥ 0
g3(x): f4(x) + 4f3(x) − 1 ≥ 0
g3(x): 2 ∗ f4(x) + min[f1(x) + f2(x), f1(x) + f3(x), f2(x) + f3(x)] − 1 ≥ 0
0 ≤ xi ≤ 1, i = 1, 2, 3, 4
− 4 ∗ f60%
1 − 3 ∗ f60%
2 − 2 ∗ f60%
3 − 1 ∗ f60%
1 − 8 ∗ f30%
1 − 6 ∗ f30%
2 − 4 ∗ f30%
3 − 2 ∗ f30%
1
This problem is a variation of DTLZ7 problem from [5] article. The problem was constructed
using constraint surface approach described in the article. However, interval coeﬃcients were
added to the goals.
Robust DTLZ1 problem
min f1(x): [0.3, 0.7] ∗ x1x2 . . . xM−1(1 + g(xM ))
min f2(x): [0.3, 0.7] ∗ x1x2 . . . (1 − xM−1)(1 + g(xM ))
. . .
min fM−1(x): [0.3, 0.7] ∗ x1(1 − x2)(1 + g(xM ))
min fM (x): [0.3, 0.7] ∗ (1 − x1)(1 + g(xM ))
where:
g(x) = 100 ∗ (5 +
M+4
i=M
[(xi − 0.5)2
− cos(20π(xi − 0.5))])
n = M + 4
subject to:
0 ≤ xi ≤ 1, i = 1, 2, . . . n
M
i=1
(−M + i − 1) ∗ f25%
i
Another problem that is suggested in [5]. It was constructed using bottom-up approach. In-
tervals were added to the goal functions. M indicates the number of goals. In the experiments
problems with 4 and 10 criteria were used.
5.4 A deterministic case
5.4.1 Single run analysis
In order to observe the algorithm’s behavior a detailed analysis of a single run is presented in this
section. The problem being analyzed is the two-criteria binary knapsack problem. The problem
is simple and the number of criteria small in order to focus on the algorithm. Moreover, no
uncertainty in the form of interval coeﬃcients is considered.

Although this section is illustrated only by an example of a single run on the single problem
the conclusions drawn here apply globally. Compare with further sections of the chapter.
The algorithm runs with default parameters. Exterior loop was repeated ten times. This means
that ten times results (in the form of solutions) were presented to the (simulated) decision maker.
The Results were recorded.
The same problem was solved in an exact way by a mathematical programming solver4
. One
used the supposed utility function as an optimization goal for the solver. In this way the optimal
solutions were obtained (for the considered instance the best supposed utility function value is
4154.441453).
The results of the run from the exterior loop perspective are given in Figure 5.1. All of the
results generated during the interior loop run were recorded. On the chart one can see the best
and the worst solutions in a given run (the rest lies between them).
One can see that in every iteration there is an improvement, however the improvements are
getting smaller and smaller. This is because the better the solutions’ population, the harder it is
to optimize it further.
Now the analysis of the interior loop is given. The first, third and seventh interior loop will be
presented. Other runs are similar, so they were omitted in the paper. The results are presented in
figures 5.2, 5.3 and 5.4. The iterations of the interior loop are also called generations because of
their evolutionary nature.
On the charts one can see supposed utility function dependent on a generation. Also value
of the primary score is included for reference. To recapitulate — the better the solution fits to
induced decision rules, the higher the primary score; in case of a draw (i.e. the primary score for
two individuals is the same) secondary score is considered.
The improvement happens on the generation basis — in every iteration the supposed utility
function is getting better. However, as one can see improvement of the primary score happens
mainly at the beginning. The runs like the one in 5.4, where the breakthrough is at the end of the
run are in fact rare. Nevertheless the algorithm should not be shut down after reaching maximum
primary score. Although preference information extracted before the run is now exploited, an
improvement can still occur. More importantly the diversification effort is ongoing now — when
the primary score (rule based) of all the population is identical, then the secondary score (distance
based — the less the solution is crowded the higher the score) starts to play major role in the
process, pushing individuals in the population to new areas in solution space. This can result in
major a breakthrough — the population may acquire traits wanted by the DM.
Rules driving the interior loops are presented below
Iteration 1
value1 ≥ 859.86 ⇒ class ≥ GOOD
Iteration 3
Iteration 7
Note that there are only two classes — GOOD and NOT GOOD so essentially class ≥ GOOD means
class = GOOD.
In order to verify how the rules influence the algorithm’s behavior, charts were prepared (Fig-
ure 5.5a, 5.5b and 5.5c). For two-criteria problem it is possible to present the search space on
two-dimensional chart.
4GNU Linear Programming Kit — http://www.gnu.org/software/glpk/

The iterations 1 and 3 show a typical run. Before the 10th generation the whole population
moves to an area covered by the rules. In the further generations diversification occurs. The 7th
iteration is a bit different though. The evolution not “discovered” a way to match the second rule
before the 20th generation. Soon after individuals covering both solutions were included the whole
population drifted to the region marked by both rules.
It can be also insightful to see the simulated DM’s choices and compare them with the inferred
rules. They are presented in charts (Figure 5.6a, 5.6b, 5.6c). It is the case that the choices directly
affected the rules. One can also see another interesting trait of the problem. The further (in
objective space) the selected individuals are, the harder it is to achieve maximal possible primary
score.
Finally, charts presenting how the supposed utility and primary score changed in the population
along with succeeding generations (Figure 5.7, 5.8 and 5.9) are given. They support the conclusions
stated above.
Analysis of single run is a good tool for understanding the internals of the method and inspect-
ing the method’s behavior. One can draw conclusions and spot potential problems and possible
improvements. However, more experiments have to be conducted and their results should be aggre-
gated using statistical methods in order to check how useful the method is and what the influence
of the algorithm parameters is. This is done in the further sections.

Supposed Utility
exterior_iteration
value
2500
3000
3500
4000
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
result_value
q min
q max
Figure 5.1: Supposed utility function in the exterior loop iterations

generation
value
2200
2400
2600
2800
q q
q
q
q
q q q q
q q
q q
q
q
q
q
q q q q q q
q
q q
q q q q q
q q q
q q
q q q q
q
q
q q
q q q
q q q q q q
q q q q q q
q
q
q q
q q
q
q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
generation
value
0.0
0.2
0.4
0.6
q
q
q
q q q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
Figure 5.2: Supposed utility function in the interior loop iterations

generation
value
3000
3100
3200
3300
3400
q
q
q q q q q q q q q
q q q q q q
q
q q q q q q q q q q q q q
q q
q q q q q q
q q q q
q q
q q
q q
q q q q q q q q q q q
q q
q q
q q
q
q q q
q
q
q q
q q q q q
q
q q q
q q q q q q q
q q
q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
generation
value
0.0
0.2
0.4
0.6
q
q
q
q q
q
0 5 10 15 20 25 30
result_value
q mean
q min
q max

generation
value
3550
3600
3650
3700
3750
q
q
q
q
q q q q
q
q q q q q q
q
q q
q q q q q
q q
q q q q q
q
q q
q q
q q
q q
q q
q q q
q
q
q
q q
q q q q
q q q q q q q q qq q
q q q q
q
q q q q q
q q q q q q
q q q q q
q q q q q
q q q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
generation
value
0.0
0.5
1.0
1.5
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q q q
q
q q q q q q q q q
q q
q q
q q q q
q
q
q
q q
q
q
q
q q
q q q
q
q q
q q
q
q
q q q
q q q q q
0 5 10 15 20 25 30
result_value
q mean
q min
q max

value1_50.0 x value2_50.0, ext.iter.=1
value1_50.0
value2_50.0
700
750
800
850
900
950
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qqq
qq
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
700 750 800 850 900 950 1000
generation
q 0
q 10
q 20
q 30
value1_50.0
value2_50.0
1000
1050
1100
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1000 1050 1100 1150
generation
q 0
q 10
q 20
q 30
value1_50.0
value2_50.0
1210
1220
1230
1240
1250
1260
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q q
qq
qq
qq
q
q
q
q
q qq
q
q
q
qqq
q q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1140 1160 1180 1200 1220
generation
q 0
q 10
q 20
q 30
Figure 5.5: Changes in the population

value1_50.0
value2_50.0
700
750
800
850
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
700 750 800 850 900
good
q
0
q 1
value1_50.0
value2_50.0
980
1000
1020
1040
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1000 1020 1040 1060 1080 1100
good
q
0
q 1
value1_50.0
value2_50.0
1210
1220
1230
1240 q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
1140 1160 1180 1200
good
q
0
q 1
Figure 5.6: Choices made by the DM in exterior loop

individual
utility
2200
2400
2600
2800
q
q
q q
q q
q
q q
q
q q q q q q q q
q
q q q q q q q
q
q q
q
q q q q q q q q q q q q
q q q
q q
q q q q
q q q q
q q q q q
q
q q q q q q
q q
q q q q q
q q q
q
q q q q q
q q q q q q q
q
q q q q q
q
q q
q q
q q
q q q
q q
q q q
q
q q q
q q q q
q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
individual
primary
0.0
0.2
0.4
0.6
q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
Figure 5.7: Population advancing through the generations of interior loop

individual
utility
3000
3100
3200
3300
3400
q
q q q
q q
q q
q
q
q
q q
q q
q
q
q q q q
q
q q
q q
q
q q q
q q q q q q q
q q q q q q q q q q q q q q q q q q q
q
q
q
q
q
q q
q q
q q q q
q q q q q q
q q q
q q q
q
q q
q q
q q
q q
q
q
q q q
q q q q q
q q q q q
q
q
q q
q
q
q q q q q q q q
q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
individual
primary
0.0
0.2
0.4
0.6
q q q q q q q q q q q q q q q q q q q q q q q q q q q
q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30

individual
utility
3550
3600
3650
3700
3750
q
q
q
q
q
q
q
q q q q
q q q q q q
q q q
q
q
q q q q
q
q q
q
q q q q q q q
q q q q
q q q q q q q q q q q q q q q q q
q q q q q q
q q
q q q q q
q q q q q q q q
q q q q q q q q q q q q q q q q
q q q q q
q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
individual
primary
0.0
0.5
1.0
1.5
q q q q q q q q q q q q q q q q q q q q q
q
q q q q q
q q
q
q
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30

5.4.2 The computational performance on exemplary problems
It is important to know what performance can be expected from an algorithm. When no uncertainty
is considered in a problem and one assumes the supposed utility function, then it is easy to get the
optimal solution using linear programming solver. Of course in real-world applications the supposed
utility function is not known a priori. One can test this algorithm in such an artificially created
environment and then assume that the behavior will resemble the one in real-world problems.
Results of the evaluation on the following problems are given:
• Two-criteria binary knapsack problem, optimal value = 4154.441453.
• Two-criteria continuous knapsack problem, optimal value = 32700.41689.
• Three-criteria binary knapsack problem, optimal value = 31502.10927.
• Three-criteria DTLZ problem generated using constraint surface approach, optimal value
= −1.1.
The tests were repeated at least fifteen times and the results averaged. They are presented in
Figure 5.10 and Table 5.1. Depending on a problem 10 or 20 iterations of the exterior loop were
simulated. Normally it would be up to the DM to stop when he or she is satisfied with the solution.
However, one can safely assume that if no satisfactory solution is found up to the 10th iteration,
the decision maker will not want to investigate the problem with this method any further.
On both binary knapsack problems the performance is very good — they are not further than
10% away from the optimal solution after the 10th iteration. The same is true for the surface
problem. However, looking at the Figure 5.10d one can see a bizarre phenomenon — the supposed
utility is falling down in a few runs.
On the other hand, the evolutionary algorithm knows nothing about the utility function so
it may happen. The chart (Figure 5.10d) was generated using aggregated (averaged) results,
however this happened in most of the runs. To give a further insight charts showing evolution in
a third exterior loop are presented (not that the charts are for a single run only, not aggregated).
The charts are in figures 5.11 and 5.12. As one can see the evolutionary algorithm improves the
population from its perspective (the primary score factor). However, supposed utility function is
being lowered in the process.
This is the case because generated decision rules:
1. f1 ≤ 0.14033 ⇒ class ≥ GOOD
2. f3 ≤ 0.56462 ⇒ class ≥ GOOD
are not selective enough. It is possible that switching DomLem to another algorithm — generating
all possible rules instead of a minimal set would help here. Still the results achieved by the
DARWIN method are good on this problem.
On the contrary, continuous knapsack problem performs extremely poor — only 55% of the
optimum after the 10th run and 58%, so almost no improvement, after the 20th. Investigating
single runs at length provided no more details. The problem lies in the evolutionary algorithm,
more precisely in its crossover operator. In the DARWIN method a from of the distance preserving
crossover is used. The form where a child is somewhere in between its parents. However, in
continuous variant it is usually the case to take items one-by-one starting with the one with the
greatest value per unit until the weight constraint is reached.
Considering the constraints (∀xi∈items : 0 ≤ xi ≤ 1) — most of the individuals will have their
decision variables in the form of xi ≈ 0.5 after a few generations. But for the optimal solutions
most of the variables take either 1 or 0. That is why improvement is happening so slowly here.
Relying on the author’s intuition changing the crossover operator would solve the problem with
continuous knapsack. Choosing the right operator for a given problem is a well-known subject in
the multi-objective optimization ([3]). However, it is out-of-scope of this paper.

knapsack_bin_c2
external iteration
value_mean
3000
3200
3400
3600
3800
4000
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
knapsack_cont_c2
external iteration
value_mean
18000
20000
22000
24000
26000
28000
30000
32000
q q q q q q q q q q q q q q q q q q q q
5 10 15 20
knapsack_bin_c3
external iteration
value_mean
20000
22000
24000
26000
28000
30000
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
5 10 15 20
dtlz_surface_c3
external iteration
value_mean
−1.20
−1.18
−1.16
−1.14
−1.12
−1.10
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
Figure 5.10: Performance comparison

generation
value
−3.0
−2.5
−2.0
−1.5
q q
q q
q
q
q
q
q
q
q
q q
q
q q
q q
q
q
q q q
q q
q q q q
q
q
q
q q
q
q q
q
q q q q q q q q
q q q q q q
q q
q
q q q
q q
q
q
q q q q q q q q q q q q q q q q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
generation
value
0.0
0.5
1.0
1.5
q
q
q
q
q
q
q q q q q q
q
q
q
q
q
q q q q
q
q q q q q q q q q
q
q q q q
q
q q q q q q q q q q
q q q q q q q q q q q q
0 5 10 15 20 25 30
result_value
q mean
q min
q max
Figure 5.11: The population in the third iteration of the exterior loop. DTLZ surface problem.

individual
utility
−3.0
−2.5
−2.0
−1.5
q
q
q q
q q q q
q
q q q q q q q q q q q q q q q q q q q q q
q
q q
q q
q q
q
q q q q
q
q q q q q
q q
q
q q q
q q
q
q q
q
q
q
q q
q q q q q
q q q
q q q q q q q
q
q q q q q
q
q
q q q
q q q
q q
q
q q
q
q q
q
q
q q
q q
q q q
q q q q
q q q
q q
q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
individual
primary
0.0
0.5
1.0
1.5
q q q q q q q
q q
q q q q q q q q q q q q q q q q q q
q q
q
q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
qq
q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q
5 10 15 20 25 30
generation
q 0
q 10
q 20
q 30
Figure 5.12: The population in the third iteration of the exterior loop. DTLZ surface problem

Exterior loop Knapsack bin 2c Knapsack cont 2c Knapsack bin 3c DTLZ surface 3c
1. 30.52% 46.79% 36.85% 7.82%
2. 23.84% 46.66% 30.77% 4.40%
3. 18.61% 46.49% 25.80% 6.79%
4. 15.96% 46.32% 20.99% 5.38%
5. 13.73% 46.16% 17.12% 5.08%
6. 12.02% 45.97% 13.57% 10.15%
7. 10.59% 45.74% 11.28% 8.33%
8. 9.20% 45.51% 9.78% 7.64%
9. 8.05% 45.29% 8.50% 5.49%
10. 7.03% 45.06% 7.22% 8.24%
11. - 44.85% 6.47% -
12. - 44.62% 5.70% -
13. - 44.38% 5.15% -
14. - 44.17% 4.67% -
15. - 43.94% 4.17% -
16. - 43.70% 3.77% -
17. - 43.46% 3.43% -
18. - 43.25% 3.12% -
19. - 43.02% 2.94% -
20. - 42.81% 2.77% -
Table 5.1: Distance from the optimum solution
For completeness charts presenting the algorithm behavior when more exterior loop iterations
are allowed are given (Figure 5.13, Table 5.2). After the twentieth loop improvements are small (if
any). In author’s opinion this is a good thing because there is no reason not to stop the algorithm.
If the algorithm works well on a problem it will be evident from the ﬁrst few iterations.

knapsack_bin_c2
external iteration
value_mean
3000
3200
3400
3600
3800
4000
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqqqq
qqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
20 40 60 80 100
knapsack_bin_c3
external iteration
value_mean
20000
22000
24000
26000
28000
30000
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
20 40 60 80 100
dtlz_surface_c3
external iteration
value_mean
−1.30
−1.25
−1.20
−1.15
−1.10
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
qq
qqq
q
q
q
q
q
q
20 40 60 80 100
Figure 5.13: Supposed utility function when long runs are allowed

Exterior loop Knapsack bin 2c Knapsack bin 3c DTLZ surface 3c
1 31.08% 37.48% 14.05%
5 13.59% 17.42% 1.22%
10 7.33% 6.48% 7.19%
15 4.72% 3.39% 7.43%
20 3.70% 2.33% 6.00%
25 3.11% 2.02% 12.06%
30 3.05% 1.75% 1.05%
35 2.99% 1.56% 9.93%
40 3.05% 1.73% 16.83%
45 2.91% 1.58% 11.93%
50 2.94% 1.49% 18.44%
55 2.94% 1.65% 14.70%
60 3.15% 1.70% 14.55%
65 3.13% 1.69% 1.61%
70 3.14% 1.60% 1.16%
75 3.34% 1.60% 8.86%
80 3.31% 1.71% 2.07%
85 3.41% 1.66% 1.15%
90 3.22% 1.62% 7.24%
95 3.45% 1.59% 4.85%
100 3.66% 1.71% 3.34%
Table 5.2: Distance from the optimum solution when long runs are allowed

5.4.3 The influence of parameters on the method
The algorithm itself contains many parameters that can potentially affect its behavior. Of course it
is always left to the analyst to fine-tune the parameters for a specific problem to solve. Nevertheless
in this section some guidelines will be given. Conclusions were drawn on the basis of experiments
performed on exemplary problems described above.
The parameters were grouped into two categories — basic ones affecting the whole method and
additional ones of less importance. The latter, however, can be used for fine-tuning to specific
problem given.
The author considers the basic parameters to be:
• Number of generation in the evolutionary loop — intuitively the more the better.
• Number of individuals in the population — again intuitively the more the better.
Numerical data are presented in Table 5.3 and 5.4. The results are also shown in charts
(Figure 5.14, 5.15, 5.16 and 5.17). This agrees with the intuition. More solutions (individuals in
the population) and longer interior loop (more generations) will result in a better value of supposed
utility for all the problems.
The other parameters are:
• Delta (δ) — decay of a rule weight.
• Eta (η) — initial mutation probability.
• Gamma (γ) — coefficient of elitism.
• Omega (ω) — decay rate of the mutation.
They are used in the interior loop (see 3.3).
There is no clear pattern through the problems. In general changing the other parameters has
only a minor influence on the algorithm. This is a good thing, because the solution given to the DM
is robust with respect to those parameters. However, analyst should try to tune the parameters
on a problem basis.
5.4.4 DM’s decisions inconsistent with supposed utility function
In the experiments the decision maker was simulated by a computer algorithm. Its choices were
based on the assumed supposed utility function. They were repeatable and always complying with
the assumed function. This is not the case in a real-world situation where the DM is a human
being and the supposed utility function is not known. His or her decisions will be noisy, not the
best, sometimes even contradictory.
For this reason it is important to measure the impact of a “noise”, inconsistencies in the DM’s
choices, on the algorithm. Normally, without introducing artificial inconsistencies the decisions are
carried as follows:
Knapsack bin 2c Knapsack cont 2c
test mean sd improvement mean sd improvement
base 3870.72 45.73 0.00% 17968.94 451.61 0.00%
generation45 3958.75 34.14 2.27% 18072.44 635.22 0.58%
generation60 3995.55 32.71 3.22% 18366.65 397.56 2.21%
solution45 3937.89 26.11 1.74% 17950.64 609.37 -0.10%
solution60 3924.30 44.31 1.38% 18341.43 325.21 2.07%
Table 5.3: Importance of base parameters

Test summary
exterior_iteration
utility_mean
3000
3200
3400
3600
3800
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
test
q base
q generation45
q generation60
q solution45
q solution60
test
utility
3800
3850
3900
3950
4000
4050
base
generation45
generation60
solution45
solution60
Figure 5.14: Basic parameters for the two-criteria binary knapsack problem
Test summary
exterior_iteration
utility_mean
17500
18000
18500
19000
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
5 10 15 20
test
q base
q generation45
q generation60
q solution45
q solution60
test
utility
17000
17500
18000
18500
19000
19500
q
q
q
base
generation45
generation60
solution45
solution60
Figure 5.15: Basic parameters for the two-criteria continuous knapsack problem

Test summary
exterior_iteration
utility_mean
20000
22000
24000
26000
28000
30000
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q q q q q q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q q
5 10 15 20
test
q base
q generation45
q generation60
q solution45
q solution60
test
utility
29000
29500
30000
30500
q
q
base
generation45
generation60
solution45
solution60
Figure 5.16: Basic parameters for the three-criteria binary knapsack problem
Test summary
exterior_iteration
utility_mean
−1.15
−1.14
−1.13
−1.12
−1.11
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
test
q base
q generation45
q generation60
q solution45
q solution60
test
utility
−1.14
−1.13
−1.12
−1.11
q
q
q
q
q
q
q
q
base
generation45
generation60
solution45
solution60
Figure 5.17: Basic parameters for the three-criteria DTLZ surface problem

test
utility
3650
3700
3750
3800
3850
3900
3950
q
q
q
q
q
q
q
q
q
base
delta_001
delta_005
delta_015
delta_020
delta_040
delta_060
eta_01eta_03eta_07eta_09gam
m
a_10
gam
m
a_15
gam
m
a_25
gam
m
a_30
gam
m
a_50
om
ega_001
om
ega_005
om
ega_015
om
ega_020
om
ega_040
om
ega_060
(a) Two-criteria binary knapsack
test
utility
17000
17500
18000
18500
19000
q
q
q
q
q
q
base
delta005delta015delta020delta040eta01
eta03
eta07
gam
m
a10gam
m
a15gam
m
a25gam
m
a30om
ega005om
ega015om
ega020om
ega040
(b) Two-criteria continuous knapsack
test
utility
28500
29000
29500
30000
q
q
q
q
base
eta03
eta07
gam
m
a10gam
m
a15gam
m
a25gam
m
a30om
ega005om
ega015om
ega020om
ega040
(c) Three-criteria binary knapsack
test
utility
−1.24
−1.22
−1.20
−1.18
−1.16
−1.14
−1.12
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
base
eta03
eta07
gam
m
a10gam
m
a15gam
m
a25gam
m
a30om
ega005om
ega015om
ega020om
ega040
(d) DTLZ surface problem
Figure 5.18: Other parameters’ inﬂuence

igor-kupczynski-msc-put-thesis

igor-kupczynski-msc-put-thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (12)

Similar to igor-kupczynski-msc-put-thesis

Similar to igor-kupczynski-msc-put-thesis (20)

igor-kupczynski-msc-put-thesis