1. Genetic algorithm with
dynamic population for solving
the simultaneous optimization
of multiple query orders
Mahyar Teymournezhad
m_teimoornezhad@yahoo.com
2. Abstract
The purpose of optimizing several query orders is to find executive
designs that minimize the total cost of executing queries using these
schemes. Each query can have multiple designs individually. Since each
project consists of a series of tasks, the MQO's goal is to find plans
sharing the most tasks with other schemes. In the general case, this is
one of the NP – Complete issues. So far, different methods have been
proposed for this problem. In this paper, the problem of optimizing
multiple query orders is solved using a dynamic population genetic
algorithm. The results show that the proposed method has lower
implementation time and more convergence speed than existing methods.
Key Words : Multi-query optimization, Genetic Algorithm, Link Sequence
3. I. INTRODUCTION
One of the most important and costly parts of the database is the optimization
of query orders.
In Section 3, the modeling of the MQO problem using the genetic algorithm is
being investigated.
If multiple query orders are requested simultaneously from the database,
obtaining an execution plan with minimal cost of executing these orders is a
database optimizer task.
Using the formulation [4], the second phase has been studied independently of
the first phase.
Usually, the second phase of MQO is the most time consuming phase in the
problem.
But to solve multiple query orders simultaneously and identify their shared
tasks, it is necessary for the entire set of execution plans for these queries to
run together, because costly design tasks may lead to more sharing with other
queries and cause getting a better solution to the MQO problem.
4. One of the most famous exploratory techniques used
to optimize complex problems is the genetic algorithm.
A definition of it is given in [8], the genetic algorithm is
used to solve many of the NP – Complete issues.
Goldberg has shown in [4] the practicality of the
genetic algorithm by presenting a summary of
applications of the genetic algorithm. The genetic
algorithm simulates the evolutionary concepts of
biology. This simulation involves probabilistic methods
using evolutionary principles [1]. In the genetic
algorithm, the original data structure is a vector of
genes (called the chromosome). Each chromosome
representing a sample is a solution to the problem. The
chromosome members (called genes) are part of the
problem solution. The quality of the solution sample
(ie, a chromosome) is defined by being close to the
optimal solution (which is called the fitness function).
II. A review of the genetic algorithm
5. The genetic algorithm searches for an optimal solution
using evolutionary operators (also called genetic
operators). Initially, with a randomly weak state,
chromosomes are produced to display a variety of
solutions. Then, genetic operators apply to weak
chromosomes and produce new chromosomes for the
next stage.
II. A review of the genetic algorithm
6. • The three operators used in the genetic algorithm are
as follows:
• Intersection operator: In the intersection operator,
a part of the parent's chromosome changes to make
the child's chromosome.
• Mutation Operator: New chromosomes are
produced by random modification of a small number
of genes in the chromosome. The mutation operator
will never apply to the best solution in the
population.
• Selection Operator: This operator determines
which chromosomes should survive for the next
generation.
II. A review of the genetic algorithm
7. • The simplest way can be to determine that better
chromosomes have a greater chance of surviving in
the next generation. This is what really happens in
evolutionary processes. However, sometimes
applying an intersection or mutation operator on an
inappropriate chromosome may produce the
appropriate chromosome [3].
II. A review of the genetic algorithm
8. The most commonly used selection techniques are:
• Cutting Method: First, all chromosomes are arranged in descending
order (from the best to the worst) based on the number assigned to the
fitness function. The n chromosomes above this list are then
transmitted to the next generation with the same probability.
• Race method: R is randomly selected. Then r chromosome is
selected from the population and the chromosome with the best value
(runtime) is transmitted to the next generation. This process continues
until the proper amount has been reached for the next generation. In
this method, a chromosome may be selected several times.
• Fortune Wheel Method: In this method, the chromosomes are
placed on the parts of the circle according to their fitness. The more
chromosomes inside a part have a better fit, the greater the area. Then
a random number is generated and the chromosomes of the part
corresponding to that random number are transmitted to the next
generation. Steinbrunn [9] has used this method.
II. A review of the genetic algorithm
9. • The genetic algorithm can easily model the MQO
problem. Each chromosome is a solution to this problem.
Each gene in the chromosomes represents a plan for the
corresponding query order.
• Each Ci chromosome is composed of a number of Gj
genes. Each gene is a solution to a query order. In a
generation, the number of chromosomes varies depending
on the number of population associated with that Pk.
• Selection Operator Σ: Takes the population of a
generation and selects some of them to be transferred to
the next generation.
• Mutation Operator (M): It takes a chromosome as input
and creates a new chromosome.
III. Modeling of Genetic Algorithm for MQO
10. To select a number of chromosomes for the next generation, the quality
of the chromosomes (which is determined by the fitness function) is
considered. A simple choice for the fitness function is to reverse the
entire execution time of the query order task.
Intersection and mutation operators can also be easily defined for the
MQO problem. These operators create new and valid solutions. Since the
genes of a chromosome represent a selective plan for the query order for
that gene, replacing a plan with the current plan creates a new and valid
solution. This is done by the mutation operator.
For the intersection operator, different types can be considered: single-
point, multi-point, and segment. In the proposed method, all these
techniques create valid solutions. If two chromosomes represent two
valid solutions for the MQO problem, each intersecting operator on these
two chromosomes will create new and valid solutions. Regardless of the
type of intersection operator and the location of doing it, since all the
pieces that move in this operator represent valid solutions for the
corresponding query instructions, the new solutions will also be valid.
III. Modeling of Genetic Algorithm for MQO
11. This paper presents a method in which the size of the population varies
in each generation. So far, in all the classical methods, the population
selection stage is considered constant. By doing this, the algorithm will
be simpler, but an artificial limitation will be created and will not follow
the natural genetic law in biology. Because the size of the population is
constantly changing. One of the drawbacks of the revelation methods is
that the algorithm stops at the local minimum and is also costly in
computing. On the other hand, the phenomenon of congestion is also one
of the destructive factors in the quality of the genetic algorithm [3].
When enough resources are available and fairly good solutions are
available, the size of the population increases, and the size of the
population decreases when the number of appropriate solutions in a
generation is low. At first glance, the proposed method may seem not to
be very effective, but experiments have shown that using this method
improves the accuracy and speed of the implementation of the genetic
algorithm.
IV. Providing a solution offer:
Genetic algorithm with dynamic population size (DP – GA)
12. Although in some generations the algorithm may increase
the population and thus increase the amount of computing,
and cause the algorithm to slow down, the choice of an
appropriate threshold value for how the population changes
will reduce overall run-time. One of the most important
parameters in this method is how to resize the population. To
do so, first, the problem is solved by the greedy method.
This is due to the speed of the implementation of the greedy
algorithm. The resulting number is used by the greedy
algorithm as a threshold value. In the next stage, considering
the difference of the best answer in each generation with the
answer given by the greedy algorithm, the population of the
next generation is determined.
IV. Providing a solution offer:
Genetic algorithm with dynamic population size (DP – GA)
13. In this paper, the population of the new generation is
calculated as follows:
In this formula is the next-gen population, is the
current generation population, is the time calculated by
the greedy algorithm and is the time of
execution of the best solution in the current generation.
IV. Providing a solution offer:
Genetic algorithm with dynamic population size (DP – GA)
14. In this section, the experimental results of comparing the
genetic algorithm with constant population and proposed
genetic algorithm with dynamic population are presented.
Experiments are performed on computers with 2.2 GHz and
2G main memory. The language of the implementation of
algorithms is C.
V. The experimental results and analysis
15. • To generate input, the parameters of Table 1 are used and
are as follows:
• At first tasks are generated randomly. To produce them, the
two parameters of the number of tasks (T) and the lowest
and the highest amount for the time of execution of tasks are
used. Initially “T” tasks are produced (ie. “T” is the number
of tasks). Then, for each generated task, a number is
allocated in the MinET and MaxET intervals as runtime.
After the tasks are created, they are distributed among the
plans. For this purpose, the parameters MinP and MaxP are
used. Although there may be many plans for using the tasks
in share, it is avoided to create two exactly identical plans.
Finally, query instructions are created. To generate them, the
parameter of the number of query orders (Q), MinQ and
MaxQ are used.
V. The experimental results and analysis
16. • Each query order has its own set of plans for execution.
Therefore, a particular scheme does not solve more than one
query command. At the same time, since each query order
consists of a plan and each plan consists of tasks, these tasks
can be shared between query orders.
TABLE I: The values used to simulate the MQO problem with genetic algorithm
V. The experimental results and analysis
Parameter name Amount
Primary population size 100
Iteration count (number of generations) 10
Mutation rate %1
17. To calculate the input size, the average values (MaxP, MinP)
and (MaxQ, MinQ) are considered, so the input size will be
equal to:
In fact, the input size is equal to the size of the MQO
problem search space. This number does not correlate with
the optimal solution value. Although a query order can have
several executable plans, only one of these plans will appear
for each query prompt in the final answer. Therefore, the
optimal solution value is related only to the number of query
orders, the number of tasks in each plan, and the execution
time of the tasks.
V. The experimental results and analysis
18. To compare the exploratory algorithms that are presented to solve
NP – Complete problems, there are usually two aspects to
consider: The algorithm execution time and the difference in the
solution obtained by the algorithm with the optimal response.
Bayir in [1] explains nine different modes of genetic algorithm
and has shown that for the problem of optimizing several query
orders, the use of the shear selection operator and the initial
population size operator is used only in the first execution of the
algorithm. In future generations, this amount will be less or more
depending on the best answer of every generation. The rate of
mutation states that in each generation of the algorithm, several
percent of the chromosomes of that generation will mutate. In the
following, we use the phrase "time to execute the chromosome" as
the time of execution of tasks that are given as the input of the
MQO problem to the genetic algorithm.
V. The experimental results and analysis
19. The results are shown in Figures 1 and 2. The GA_DP algorithm
shows more precision in finding the optimal answer. The reason
for this is using a variable population that allows the algorithm to
pass through the local minimum and search for more space. It is
also shown in Figure 2 that the speed of the proposed algorithm is
also increased. In cases where the execution time of the
chromosomes of a generation has a significant difference with the
optimal solution, the size of the population decreases. This
decrease in population causes an increase in the speed of the
algorithm and, as noted in Section 4, increases the convergence
rate to the answer in these cases.
V. The experimental results and analysis
20. V. The experimental results and analysis
250 350 450 550 650 750
Figure 1. Response time chart
21. V. The experimental results and analysis
250 350 450 550 650 750
Figure 2. Runtime graph
22. VI. CONCLUSION
In this paper, a solution based on a genetic algorithm with variable population
was presented. One of the main drawbacks of the genetic algorithm is the
timing of computing and getting caught up in local extremums. By the
proposed method, although in some cases the amount of computing in one
generation of the algorithm may be increased, overall, the amount of
computations is reduced and, as a result, the algorithm's speed will be
increased. Also, due to the population growth in the conditions mentioned in
the article, this allows a proposed algorithm to pass through local minimums
and has a faster convergence rate than the base genetic algorithm. The
disadvantage of this method is that we need to set the initial parameters and
the appropriate threshold for the rate of population change. The greedy
method was used as threshold value in this paper. It seems that using random
methods such as random walk and recovery iteration are more accurate
estimations to determine this amount.
23. REFERENCES
[1] Murat Ali Bayir, Ismail H. Toroslu, and Ahmet Cosar, Genetic Algorithm for the Multiple-Query Optimization
Problem, 2007, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS
AND REVIEWS, VOL. 37, NO. 1, JANUARY 2007
[2] Guido Moerkotte, Building Query Compilers, Page [375- 385], 2006
[3] Tom M. Mitchell, Machine Learning, page [250-270] 1997
[4] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning.
Reading, MA: Addison-Wesley, 1989
[5] T. Sellis, “Multiple query optimization,” ACM Trans. Database Syst.,vol. 13, no. 1, pp. 23–
52, 1988.
[6] A. Cosar, E. P. Lim, and J. Srivastava, “Multiple query optimization with depth-first branch-and-bound and
dynamic query ordering,” in Proc.CIKM 93, 1993, pp. 433–438.
[7] K. Shim, T. Sellis, and D. Nau, “Improvements on a heuristic algorithm for multiple-query
optimization,” Data Knowl. Eng., vol. 12, no. 2, pp. 197– 222, 1994.
[8] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor,MI: Univ. Michigan
Press, 1975.
[9] Michael Steinbrunn, Guido Moerkotte, Alfons Kemper, Heuristic and randomized
optimization for the join ordering problem,The VLDB Journal(1997)6:191–208