Francesco Corucci – Bioinformatics course (Prof. R. Barbuti)
Percorso di Eccellenza, Laurea Magistrale in Ingegneria Informatica – Università di Pisa
From a work of Taishin Yasunobu Nishida (nishida@pu-toyama.ac.jp)
Faculty of Engineering, Toyama Prefetural University
NP-Complete problems
2
 Problems for which no polynomial solution is known
 Many examples of such problems, very often related to
practical applications (logistic, computer science,
biology, etc)
 A common approach consist in addressing these
problems with sub-optimal approximation algorithms
that can be solved in polynomial time
 P-systems can be usefull within this context
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Outline of membrane algorithms3
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Mm-1
M0
M1
Components of membrane algorithms
4
A membrane algorithm for approximating
an optimization problem consists of:
1. a certain number of regions, outlined
by nested membranes (labeled Mi)
2. in every region, a subalgorithm (si)
and a few tentative solutions
3. a solution transporting mechanism
between adjacent regions
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
tentative solutions
sm-1
s1
s0
subalgorithm
Membrane algorithm
5
A step of the membrane algorithm acts as
follows:
 in every region, simultaneously, tentative
solutions are updated by the subalgorithm
placed in the same region
 solutions transport mechanism: in every
region, the best solution (with respect to the
optimization criterion) is sent to the
adjacent inner region, the worst is sent to
the adjacent outer one
Mm-1
M0
M1
best from M2
W
B
B
W
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Membrane algorithm
6
The membrane algorithm repeats updating and transporting
solutions until a termination condition is satisfied.
Possible termination conditions are:
 Max number of iterations limit
 The best solution is not changed in a predetermined
number of steps
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Output definition
7
 The innermost membrane (M0) is defined as the output
membrane of the algorithm
 Its content at the end of the execution is the approximated
solution for the optimization problem
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Subalgorithms
8
 A membrane algorithm can use a number of different of
subalgorithms
 A subalgorithm can be any approximate algorithm for
optimization problems. Examples are:
 Genetic algorithms
 Tabu search
 Simulated annealing
 …
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Escaping from local minima
9
 The membrane algorithm should be able to escape from local
minima (we are searching for a global one!)
 For this reason, subalgorithms placed in the outer regions should
enhance random search (e.g. with random mutations)
 In the innermost membrane, a subalgorithm enhancing the local
search should be used instead (e.g. search for neighboring), in
order to refine the good solutions selected
 Assigning appropriate subalgorithms for a given problem is critic in order to
obtain good performances from the membrane algorithm
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Consideration about parallelism
10
 At every step, the subalgorithm execution in each region is
independent from the others
 Very simple communication occurs at the end of the step,
between adjacent regions
 The membrane algorithm could be easily implemented in
parallel, distributed, or grid computing systems
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
A practical example:
The Traveling Salesman Problem11
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Traveling Salesman Problem
12
 The problem: given a list of cities and their pairwise
distances, the task is to find the shortest route that visits
each city exactly once
 The TSP has several practical applications:
 planning, logistic
 microchip manufacturing (“cities” are soldering points, the path
is the electronic track)
 It has been showed that the TSP is a NP-complete problem
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Details of the algorithm
13
 Let m be the number of membranes (M0 is the innermost)
 An istance of the TSP with n nodes consists n pairs of real
numbers vi =(xi,yi), i=0,1,…, n-1 (n points in a
bidimensional space)
 Distance (Euclidean): d(vi,vj) = xi − x𝑗
2 + yi − y𝑗
2
 A solution v = (v0, v1, …, vn-1) (order of visit) has value
(cost) W(v) = 𝑑(vi,vi+1)𝑛−2
𝑖=0 + 𝑑(vn−1,v0)
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Details of the algorithm
14
 Obviously, for two solutions u and
v, v is better than u if W(v) < W(u)
 The solution which has the minimum
value among all possible solutions
is said to be the strict solution of a
TSP istance
 One tentative solution in M0 and
two in all other regions
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Mm-1
M0
M1
Innermost region subalgorithm
15
 We set tabu search as subalgorithm in the innermost region (M0):
this algorithm searches a neighbor of the tentative solution by
exchanging two adjacent nodes in the solution (local search, for
refining)
 Tabu search resets the tentative solution if:
1. The value of the neighboring solution is less than that of the tentative
solution (the former becomes the new tentative solution)
2. The value of the best solution in region M1 is less than of the
tentative solution (the former becomes the new tentative solution)
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Outer regions subalgorithm
16
 The chosen subalgorithm for the outer regions (inspired by genetic
algorithms) is described as follows:
1. If the two solutions have the same value, a part of one solution
(selected probabilistically) is reversed (→ avoids duplicates);
2. Recombinate the two solutions producing two new solutions
(crossover: several methods are possible, EXX was used);
3. Modify the two solutions by point mutations (in the ith region, a
mutation occurs with probability
𝑖
𝑚
)
→ enhances random search, as requested
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Overall algorithm
17
1. Consider an instance of the TSP
2. Randomly construct one tentative solution for region 0 and two tentative
solutions for every region from 1 to m-1
3. For k = 0 … d (‘d’ is a parameter)
1. Update: simultaneously update tentative solutions in each region using the
associated subalgorithm
2. Transport: for each region ‘i’, send the best solution to region i-1 (inner),
and the worst to region i+1 (outer) (region 0 and m-1 can move only one
solution). Remove all solutions but the best two.
4. Output the tentative solution in region 0 as the output of the algorithm
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Experimental results18
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Experimental results
19
 The algorithm was implemented
in Java programming language
and tested on a computer
 Figure shows an execution
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Comparison with simulated annealing
20
 51 nodes, 40’000 iterations, 20
executions, variable number of
membranes (from 2 to 70) (eil51)
 100 nodes, 100’000 iterations, 20
executions (kroA100)
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Comparing the membrane algorithm with simulated annealing (SA) (a probabilistic algorithm
often used for TSP solving)
STRICT VALUE: 426
STRICT VALUE: 21’282
Comparison with simulated annealing
21 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 The membrane algorithm worked slightly better
than SA in the first test and slightly worse in the
second one
 Since the differences are very small, we may
conclude that the membrane algorithm is as good as
the simulated annealing
Saturation effect
22 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 With more membranes we get a better
approximation…
 …however, experimental results seem to point out
that in some cases the improvement achieved with
more membranes tends to saturate
 Since the computation time is proportional to the
number of membranes, we need a trade-off
Fast convergence
23 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 Figure shows the changes of
the average value of
solutions for kroA100
problem solved by a
membrane algorithm with 50
membranes as the number of
iterations increases
 The algorithm converges
rather quickly to good
solutions
Convergence to good
solutions in about 2000-
3000 iterations
Improved membrane algorithms24
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Improved membrane algorithms
25 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
Is it possible to improve the performances of the
membrane algorithm incorporating the concepts of a
tissue P-system (→compound approach) or of P-systems
with dynamic membrane structure (→ shrink approach)
Compound membrane algorithm
26 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 Tissue based, two phases:
1. First phase: a certain number
of membrane algorithms
produce good solutions from
randomly generated initial
tentative solutions
2. Second phase: the good
solutions produced by the
first phase are used as initial
ones for the second phase
Compound membrane algorithm
27 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 Set up:
 100 membrane algorithms in the first phase
 Every algorithm uses 50 membranes
 Each algorithm in the first phase terminates if the best solution does not improve
during 500 iteration
 The membrane algorithm in the second phase terminates if the best solution
does not improve in 5000 iterations
Compound membrane algorithm
28 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 The table shows the results of
experimental tests:
 We can see that the compound
membrane algorithm has significantly
improved performances compared to
previous approaches (it always
outputs almost strict solutions)
 The computation time of compound membrane algorithm was obviously much longer than
that of the simple algorithm on a common computer
 However, because the execution of the membrane algorithms in the first phase are
completely independent, they could be easily parallelized on a distributed architecture,
so that the computation time will be only twice related to the simple algorithm
STRICT: 426 STRICT: 21282
Compound membrane algorithm
29 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 Possible explainations for the good performances of the
compound approach:
1. Large number of random initial solutions
2. The first phase selects «good seeds»
3. The second phase generates very good solutions by recombining the good seeds
obtained in the first phase
Shrink membrane algorithm
30 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 Based on dynamic membrane structure, it consists of three phases
1. Phase 1: a certain number of algorithms (with five membranes
and Genetic Algorithm based subalgorithms in all regions) are
executed. After termination condition…
2. Phase 2: «shrink» the systems to two membranes and refine with
tabu search in region 0 and GA type subalgorithms in region 1
3. Phase 3: pass the good seeds selected in the previous phases to a
second stage, like in the compound approach
Shrink membrane algorithm
31 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 We can see from the table
above that the shrink algorithm
beats the compound, also being
significantly faster
Conclusions
32 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
 The experimental results prove the effectiveness of the membrane
approach for approximating NP-complete problems
 We saw how the performances can be improved considering some
variants of P-systems (tissue based and with dynamic membranes)
 There are many possibilities for further researches:
 Using different subalgorithms
 Using different dynamic structures
 Using different terminating conditions
 Introducing further P-systems ingredients
Thank you! :-)
Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
33

P-Systems for approximating NP-Complete optimization problems

  • 1.
    Francesco Corucci –Bioinformatics course (Prof. R. Barbuti) Percorso di Eccellenza, Laurea Magistrale in Ingegneria Informatica – Università di Pisa From a work of Taishin Yasunobu Nishida (nishida@pu-toyama.ac.jp) Faculty of Engineering, Toyama Prefetural University
  • 2.
    NP-Complete problems 2  Problemsfor which no polynomial solution is known  Many examples of such problems, very often related to practical applications (logistic, computer science, biology, etc)  A common approach consist in addressing these problems with sub-optimal approximation algorithms that can be solved in polynomial time  P-systems can be usefull within this context Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 3.
    Outline of membranealgorithms3 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 4.
    Mm-1 M0 M1 Components of membranealgorithms 4 A membrane algorithm for approximating an optimization problem consists of: 1. a certain number of regions, outlined by nested membranes (labeled Mi) 2. in every region, a subalgorithm (si) and a few tentative solutions 3. a solution transporting mechanism between adjacent regions Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza tentative solutions sm-1 s1 s0 subalgorithm
  • 5.
    Membrane algorithm 5 A stepof the membrane algorithm acts as follows:  in every region, simultaneously, tentative solutions are updated by the subalgorithm placed in the same region  solutions transport mechanism: in every region, the best solution (with respect to the optimization criterion) is sent to the adjacent inner region, the worst is sent to the adjacent outer one Mm-1 M0 M1 best from M2 W B B W Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 6.
    Membrane algorithm 6 The membranealgorithm repeats updating and transporting solutions until a termination condition is satisfied. Possible termination conditions are:  Max number of iterations limit  The best solution is not changed in a predetermined number of steps Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 7.
    Output definition 7  Theinnermost membrane (M0) is defined as the output membrane of the algorithm  Its content at the end of the execution is the approximated solution for the optimization problem Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 8.
    Subalgorithms 8  A membranealgorithm can use a number of different of subalgorithms  A subalgorithm can be any approximate algorithm for optimization problems. Examples are:  Genetic algorithms  Tabu search  Simulated annealing  … Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 9.
    Escaping from localminima 9  The membrane algorithm should be able to escape from local minima (we are searching for a global one!)  For this reason, subalgorithms placed in the outer regions should enhance random search (e.g. with random mutations)  In the innermost membrane, a subalgorithm enhancing the local search should be used instead (e.g. search for neighboring), in order to refine the good solutions selected  Assigning appropriate subalgorithms for a given problem is critic in order to obtain good performances from the membrane algorithm Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 10.
    Consideration about parallelism 10 At every step, the subalgorithm execution in each region is independent from the others  Very simple communication occurs at the end of the step, between adjacent regions  The membrane algorithm could be easily implemented in parallel, distributed, or grid computing systems Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 11.
    A practical example: TheTraveling Salesman Problem11 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 12.
    Traveling Salesman Problem 12 The problem: given a list of cities and their pairwise distances, the task is to find the shortest route that visits each city exactly once  The TSP has several practical applications:  planning, logistic  microchip manufacturing (“cities” are soldering points, the path is the electronic track)  It has been showed that the TSP is a NP-complete problem Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 13.
    Details of thealgorithm 13  Let m be the number of membranes (M0 is the innermost)  An istance of the TSP with n nodes consists n pairs of real numbers vi =(xi,yi), i=0,1,…, n-1 (n points in a bidimensional space)  Distance (Euclidean): d(vi,vj) = xi − x𝑗 2 + yi − y𝑗 2  A solution v = (v0, v1, …, vn-1) (order of visit) has value (cost) W(v) = 𝑑(vi,vi+1)𝑛−2 𝑖=0 + 𝑑(vn−1,v0) Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 14.
    Details of thealgorithm 14  Obviously, for two solutions u and v, v is better than u if W(v) < W(u)  The solution which has the minimum value among all possible solutions is said to be the strict solution of a TSP istance  One tentative solution in M0 and two in all other regions Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza Mm-1 M0 M1
  • 15.
    Innermost region subalgorithm 15 We set tabu search as subalgorithm in the innermost region (M0): this algorithm searches a neighbor of the tentative solution by exchanging two adjacent nodes in the solution (local search, for refining)  Tabu search resets the tentative solution if: 1. The value of the neighboring solution is less than that of the tentative solution (the former becomes the new tentative solution) 2. The value of the best solution in region M1 is less than of the tentative solution (the former becomes the new tentative solution) Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 16.
    Outer regions subalgorithm 16 The chosen subalgorithm for the outer regions (inspired by genetic algorithms) is described as follows: 1. If the two solutions have the same value, a part of one solution (selected probabilistically) is reversed (→ avoids duplicates); 2. Recombinate the two solutions producing two new solutions (crossover: several methods are possible, EXX was used); 3. Modify the two solutions by point mutations (in the ith region, a mutation occurs with probability 𝑖 𝑚 ) → enhances random search, as requested Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 17.
    Overall algorithm 17 1. Consideran instance of the TSP 2. Randomly construct one tentative solution for region 0 and two tentative solutions for every region from 1 to m-1 3. For k = 0 … d (‘d’ is a parameter) 1. Update: simultaneously update tentative solutions in each region using the associated subalgorithm 2. Transport: for each region ‘i’, send the best solution to region i-1 (inner), and the worst to region i+1 (outer) (region 0 and m-1 can move only one solution). Remove all solutions but the best two. 4. Output the tentative solution in region 0 as the output of the algorithm Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 18.
    Experimental results18 Bioinformatics course- Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 19.
    Experimental results 19  Thealgorithm was implemented in Java programming language and tested on a computer  Figure shows an execution Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 20.
    Comparison with simulatedannealing 20  51 nodes, 40’000 iterations, 20 executions, variable number of membranes (from 2 to 70) (eil51)  100 nodes, 100’000 iterations, 20 executions (kroA100) Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza Comparing the membrane algorithm with simulated annealing (SA) (a probabilistic algorithm often used for TSP solving) STRICT VALUE: 426 STRICT VALUE: 21’282
  • 21.
    Comparison with simulatedannealing 21 Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  The membrane algorithm worked slightly better than SA in the first test and slightly worse in the second one  Since the differences are very small, we may conclude that the membrane algorithm is as good as the simulated annealing
  • 22.
    Saturation effect 22 Bioinformaticscourse - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  With more membranes we get a better approximation…  …however, experimental results seem to point out that in some cases the improvement achieved with more membranes tends to saturate  Since the computation time is proportional to the number of membranes, we need a trade-off
  • 23.
    Fast convergence 23 Bioinformaticscourse - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  Figure shows the changes of the average value of solutions for kroA100 problem solved by a membrane algorithm with 50 membranes as the number of iterations increases  The algorithm converges rather quickly to good solutions Convergence to good solutions in about 2000- 3000 iterations
  • 24.
    Improved membrane algorithms24 Bioinformaticscourse - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza
  • 25.
    Improved membrane algorithms 25Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza Is it possible to improve the performances of the membrane algorithm incorporating the concepts of a tissue P-system (→compound approach) or of P-systems with dynamic membrane structure (→ shrink approach)
  • 26.
    Compound membrane algorithm 26Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  Tissue based, two phases: 1. First phase: a certain number of membrane algorithms produce good solutions from randomly generated initial tentative solutions 2. Second phase: the good solutions produced by the first phase are used as initial ones for the second phase
  • 27.
    Compound membrane algorithm 27Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  Set up:  100 membrane algorithms in the first phase  Every algorithm uses 50 membranes  Each algorithm in the first phase terminates if the best solution does not improve during 500 iteration  The membrane algorithm in the second phase terminates if the best solution does not improve in 5000 iterations
  • 28.
    Compound membrane algorithm 28Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  The table shows the results of experimental tests:  We can see that the compound membrane algorithm has significantly improved performances compared to previous approaches (it always outputs almost strict solutions)  The computation time of compound membrane algorithm was obviously much longer than that of the simple algorithm on a common computer  However, because the execution of the membrane algorithms in the first phase are completely independent, they could be easily parallelized on a distributed architecture, so that the computation time will be only twice related to the simple algorithm STRICT: 426 STRICT: 21282
  • 29.
    Compound membrane algorithm 29Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  Possible explainations for the good performances of the compound approach: 1. Large number of random initial solutions 2. The first phase selects «good seeds» 3. The second phase generates very good solutions by recombining the good seeds obtained in the first phase
  • 30.
    Shrink membrane algorithm 30Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  Based on dynamic membrane structure, it consists of three phases 1. Phase 1: a certain number of algorithms (with five membranes and Genetic Algorithm based subalgorithms in all regions) are executed. After termination condition… 2. Phase 2: «shrink» the systems to two membranes and refine with tabu search in region 0 and GA type subalgorithms in region 1 3. Phase 3: pass the good seeds selected in the previous phases to a second stage, like in the compound approach
  • 31.
    Shrink membrane algorithm 31Bioinformatics course - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  We can see from the table above that the shrink algorithm beats the compound, also being significantly faster
  • 32.
    Conclusions 32 Bioinformatics course- Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza  The experimental results prove the effectiveness of the membrane approach for approximating NP-complete problems  We saw how the performances can be improved considering some variants of P-systems (tissue based and with dynamic membranes)  There are many possibilities for further researches:  Using different subalgorithms  Using different dynamic structures  Using different terminating conditions  Introducing further P-systems ingredients
  • 33.
    Thank you! :-) Bioinformaticscourse - Laurea Magistrale in Ingegneria Informatica, Percorso di Eccellenza 33