This paper presents a study of the fault-tolerant nature of Genetic Algorithms (GAs) on a real-world Desktop Grid System, without implementing any kind of fault-tolerance mechanism.
The aim is to extend to parallel GAs previous works tackling fault-tolerance characterization in Genetic Programming.
The results show that GAs are able to achieve a similar quality in results in comparison with a failure-free system
in three of the six scenarios under study despite
the system degradation. Additionally, we show that a small increase on the initial population size is a successful method to
provide resilience to system failures in five of the scenarios. Such
results suggest that Paralle GAs are inherently and naturally
fault-tolerant.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Characterizing Fault Tolerance of Genetic Algorithms in Desktop Grid Systems
1. Characterizing Fault Tolerance of Genetic
Algorithms in Desktop Grid Systems
˜ ´ ´
Daniel Lombrana Gonzalez Juan Luis Jimenez Laredo
´ ´
Francisco Fernandez de Vega Juan Julian Merelo
´
Guervos
April 8, 2010
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
2. Outline
1 Introduction
2 Motivation
3 Methodology
4 Experiments and Results
5 Conclusions
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
3. Introduction
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
4. Introduction
Parallel Genetic Algorithms (PGA)
Sometimes Evolutionary Algorithms (EAs) require large
execution times.
One solution is to use:
Parallel Computing and
Distributed Platforms.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
7. Introduction
Failures in distributed platforms
Distributed platforms are prone to errors.
Failures are expected events rather than catastrophic
exceptions.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
8. Introduction
Fault Tolerance
Fault Tolerance
is the ability of a system to behave in a well-defined manner
once a failure occurs.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
9. Introduction
Fault Tolerance
Different techniques have been developed to cope with failures:
Redundancy,
S. Ghosh. Distributed systems: an algorithmic approach. Chapman & Hall/CRC, 2006.
Checkpointing,
E. Elnozahy, L. Alvisi, Y. Wang, and D. Johnson. A survey of rollback-recovery protocols in
message-passing systems. ACM Computing Surveys (CSUR), 34(3):375–408, 2002.
Rejuvenation frameworks,
A. T. Tai and K. S. Tso. A performability-oriented software rejuvenation framework for distributed
applications. In DSN ’05, pages 570–579, Washington, DC, USA, 2005. IEEE Computer Society.
etc.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
10. Introduction
Fault Tolerance
The use of a fault tolerance technique mandates that:
the application has to be modified, and even
the parallel algorithm.
Thus, this modification can represent a heavy burden for the
developer.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
11. Motivation
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
12. Motivation
Parallel EAs and Fault Tolerance
To the best of our knowledge
there has been little research about the fault tolerance features
of PEAs in general and of PGA applications in particular.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
13. Motivation
Previous Works
We firstly studied the Fault-Tolerance nature of Parallel
Genetic Programming (PGP) on:
Real World Desktop Grid Systems.
Concluding that PGP is fault-tolerant by default.
˜ ´ ´
Daniel Lombrana Gonzalez, Francisco Fernandez de Vega, and Henri Casanova.
Characterizing fault tolerance in genetic programming.
Future Generation Computer Systems, 2010.
DOI: 10.1016/j.future.2010.02.006.
˜ ´ ´
Daniel Lombrana Gonzalez, Francisco Fernandez de Vega, and Henri Casanova.
Characterizing fault tolerance in genetic programming.
In Workshop on Bio-Inspired Algorithms for Distributed Systems,
pages 1–10. Barcelona, Spain, 2009. ISBN 978-1-60558-564-2.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
14. Motivation
Proposal
Based on this insight
This work builds on top of the previous ones, and extends the
study of fault-tolerance in EAs to PGAs, using the same
methodology.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
15. Methodology
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
17. Methodology
Desktop Grid platforms (DGs)
DGs exhibit large numbers of failures.
DGs failure behavior has been studied in literature.
DGs are low-cost when compared to clusters of
comparable scale.
And, PGA applications are loosely coupled and thus
well-suited to DGs.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
18. Methodology
Desktop Grid Platforms
DGs are very promising for PGA applications, and
their high failure rate make them a great test case for
studying and characterizing the fault tolerance abilities of
PGA.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
19. Methodology
Experiments
In order to characterize the fault-tolerant nature of PGA we
run two kind of experiments:
a failure-free environment, and
replaying and simulating failure traces from real-world DG
platforms.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
20. Methodology
DG traces
We perform simulations of DG platforms and of host
availability based on three real-world traces:
entrfin,
ucb,
xwtr.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
21. Methodology
DG traces
Trace Hosts Venue Time
Entrfin 275 San Diego 1.0 months
Ucb 85 UC Berkeley 1.5 months
Xwtr 100 ´
Univeriste Paris-Sud 1.0 months
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
22. Methodology
Using the traces
We consider two cases:
hosts that become unavailable never become available
again (worst case assumption),
and the complete host-churn (unavailable hosts can be
re-acquired afterwards).
For two different days of each trace.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
23. Methodology
Host availability for 1 day of the ucb trace
25
20
15
Computers
10
5
0
0 50 100 150 200 250 300
Time Step
Original Trace Trace without return
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
25. Experiments and Results
Problems
We conduct experiments with a 3-trap instance:
a →
− →
−
→
−
trap(u( x )) = z (z − u( x )), if u( x ) ≤ z
(1)
b →
−
l−z (u( x ) − z), otherwise
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
26. Experiments and Results
GA Parameters for 3-Trap instance
Trap instance
Size of sub-function (k ) 3
Number of sub-functions (m) 10
Individual length (L) 30
GA settings
GA GGA
Population size 3000
Selection of Parents Binary Tournament
Recombination Uniform crossover, pc = 1.0
1
Mutation Bit-Flip mutation, pm = L
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
27. Experiments and Results
Population size vs. generation
4000 0
3500
3000 25
2500
Individuals
% of Loss
2000 50
1500
1000 75
500
0 100
0 10 20 30 40 50
Generations
entrfin 1 ucb 1 xwtr 1
entrfin 2 ucb 2 xwtr 2
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
28. Experiments and Results
Obtained Fitness for 3-Trap Day1
Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?
Entrfin 23.30 W = 6093, p-value = 0.002688 yes
Entrfin 10% 23.47 W = 5408.5, p-value = 0.2535 no
Entrfin 20% 23.48 W = 5360, p-value = 0.3137 no
Entrfin 30% 23.49 W = 5283.5, p-value = 0.4271 no
Entrfin 40% 23.57 W = 4923.5, p-value = 0.8286 no
Entrfin 50% 23.59 W = 4910.5, p-value = 0.7994 no
Ucb 23.22 W = 6453, p-value = 6.877e-05 yes
Ucb 10% 23.27 W = 6098.5, p-value = 0.002753 yes
Ucb 20% 23.37 W = 5837.5, p-value = 0.02051 yes
Ucb 30% 23.40 W = 5664, p-value = 0.06588 no
Ucb 40% 23.51 W = 5186.5, p-value = 0.6004 no
Ucb 50% 23.42 W = 5623, p-value = 0.08335 no
Xwtr 23.56 W = 5056, p-value = 0.8748 no
Xwtr 10% 23.57 W = 4923.5, p-value = 0.8286 no
Xwtr 20% 23.68 W = 4474, p-value = 0.1245 no
Xwtr 30% 23.73 W = 4259.5, p-value = 0.02812 yes
Xwtr 40% 23.68 W = 4502, p-value = 0.1466 no
Xwtr 50% 23.71 W = 4356.5, p-value = 0.05817 no
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
29. Experiments and Results
Obtained fitness for 3-Trap Day2
Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?
Entrfin 23.57 W = 4979.5, p-value = 0.9546 no
Entrfin 10% 23.69 W = 4397.5, p-value = 0.07682 no
Entrfin 20% 23.67 W = 4522.5, p-value = 0.1645 no
Entrfin 30% 23.70 W = 4405, p-value = 0.08086 no
Entrfin 40% 23.69 W = 4453.5, p-value = 0.11 no
Entrfin 50% 23.75 W = 4162.5, p-value = 0.01234 yes
Ucb 23.09 W = 6672.5, p-value = 7.486e-06 yes
Ucb 10% 23.12 W = 6826, p-value = 6.647e-07 yes
Ucb 20% 23.14 W = 6654, p-value = 7.223e-06 yes
Ucb 30% 23.26 W = 6371, p-value = 0.0001507 yes
Ucb 40% 23.37 W = 5893.5, p-value = 0.01316 yes
Ucb 50% 23.32 W = 6108, p-value = 0.002166 yes
Xwtr 23.60 W = 4806, p-value = 0.5791 no
Xwtr 10% 23.62 W = 4765, p-value = 0.5002 no
Xwtr 20% 23.69 W = 4453.5, p-value = 0.11 no
Xwtr 30% 23.60 W = 4806, p-value = 0.5791 no
Xwtr 40% 23.63 W = 4688.5, p-value = 0.3695 no
Xwtr 50% 23.77 W = 4065.5, p-value = 0.004877 yes
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
30. Experiments and Results
Obtained fitness with host-churn
Table: Day1
Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?
Entrfin 23.52 W = W = 5222, p-value = 0.5322 no
Ucb 21.31 W = 9708.5, p-value < 2.2e-16 yes
Xwtr 23.64 W = 4640, p-value = 0.2982 no
Table: Day2
Error Free fitness = 23.56
Trace Fitness Wilcoxon Test Significantly different?
Entrfin 23.58 W = 4931, p-value = 0.8452 no
Ucb 23.03 W = 7038.5, p-value = 4.588e-08 yes
Xwtr 23.7 W = 4405, p-value = 0.08086 no
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
31. Conclusions
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
32. Conclusions
Summary of Results
PGA applications are fault-tolerant by nature in DG
platforms.
PGA features the well-known fault-tolerant technique
known as graceful degradation in DG platforms.
We provided a new method to mitigate the effect of failures
by increasing the initial population.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
33. Conclusions
Conclusions
We have studied and characterized the behavior of PGA
applications running in distributed platforms with high
failure rates.
We have tested the PGA fault-tolerance using three
real-world DG traces.
Our main conclusion is that PGA inherently provides
graceful degradation.
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
34. Conclusions
Questions
daniellg@unex.es
juanlu@geneura.ugr.es
fcofdez@unex.es
jmerelo@geneura.ugr.es
Icons from Tango Desktop project and Gnome Desktop (Creative Commons & GPL License)
˜ ´
D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010