Characterizing Fault Tolerance of Genetic Algorithms in Desktop Grid Systems

820 views
741 views

Published on

This paper presents a study of the fault-tolerant nature of Genetic Algorithms (GAs) on a real-world Desktop Grid System, without implementing any kind of fault-tolerance mechanism.
The aim is to extend to parallel GAs previous works tackling fault-tolerance characterization in Genetic Programming.
The results show that GAs are able to achieve a similar quality in results in comparison with a failure-free system
in three of the six scenarios under study despite
the system degradation. Additionally, we show that a small increase on the initial population size is a successful method to
provide resilience to system failures in five of the scenarios. Such
results suggest that Paralle GAs are inherently and naturally
fault-tolerant.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
820
On SlideShare
0
From Embeds
0
Number of Embeds
153
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Characterizing Fault Tolerance of Genetic Algorithms in Desktop Grid Systems

  1. 1. Characterizing Fault Tolerance of Genetic Algorithms in Desktop Grid Systems ˜ ´ ´ Daniel Lombrana Gonzalez Juan Luis Jimenez Laredo ´ ´ Francisco Fernandez de Vega Juan Julian Merelo ´ Guervos April 8, 2010 ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  2. 2. Outline 1 Introduction 2 Motivation 3 Methodology 4 Experiments and Results 5 Conclusions ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  3. 3. Introduction ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  4. 4. Introduction Parallel Genetic Algorithms (PGA) Sometimes Evolutionary Algorithms (EAs) require large execution times. One solution is to use: Parallel Computing and Distributed Platforms. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  5. 5. Introduction Parallel algorithms can be run in ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  6. 6. Introduction Parallel algorithms can be run in ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  7. 7. Introduction Failures in distributed platforms Distributed platforms are prone to errors. Failures are expected events rather than catastrophic exceptions. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  8. 8. Introduction Fault Tolerance Fault Tolerance is the ability of a system to behave in a well-defined manner once a failure occurs. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  9. 9. Introduction Fault Tolerance Different techniques have been developed to cope with failures: Redundancy, S. Ghosh. Distributed systems: an algorithmic approach. Chapman & Hall/CRC, 2006. Checkpointing, E. Elnozahy, L. Alvisi, Y. Wang, and D. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys (CSUR), 34(3):375–408, 2002. Rejuvenation frameworks, A. T. Tai and K. S. Tso. A performability-oriented software rejuvenation framework for distributed applications. In DSN ’05, pages 570–579, Washington, DC, USA, 2005. IEEE Computer Society. etc. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  10. 10. Introduction Fault Tolerance The use of a fault tolerance technique mandates that: the application has to be modified, and even the parallel algorithm. Thus, this modification can represent a heavy burden for the developer. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  11. 11. Motivation ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  12. 12. Motivation Parallel EAs and Fault Tolerance To the best of our knowledge there has been little research about the fault tolerance features of PEAs in general and of PGA applications in particular. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  13. 13. Motivation Previous Works We firstly studied the Fault-Tolerance nature of Parallel Genetic Programming (PGP) on: Real World Desktop Grid Systems. Concluding that PGP is fault-tolerant by default. ˜ ´ ´ Daniel Lombrana Gonzalez, Francisco Fernandez de Vega, and Henri Casanova. Characterizing fault tolerance in genetic programming. Future Generation Computer Systems, 2010. DOI: 10.1016/j.future.2010.02.006. ˜ ´ ´ Daniel Lombrana Gonzalez, Francisco Fernandez de Vega, and Henri Casanova. Characterizing fault tolerance in genetic programming. In Workshop on Bio-Inspired Algorithms for Distributed Systems, pages 1–10. Barcelona, Spain, 2009. ISBN 978-1-60558-564-2. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  14. 14. Motivation Proposal Based on this insight This work builds on top of the previous ones, and extends the study of fault-tolerance in EAs to PGAs, using the same methodology. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  15. 15. Methodology ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  16. 16. Methodology Master-Worker ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  17. 17. Methodology Desktop Grid platforms (DGs) DGs exhibit large numbers of failures. DGs failure behavior has been studied in literature. DGs are low-cost when compared to clusters of comparable scale. And, PGA applications are loosely coupled and thus well-suited to DGs. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  18. 18. Methodology Desktop Grid Platforms DGs are very promising for PGA applications, and their high failure rate make them a great test case for studying and characterizing the fault tolerance abilities of PGA. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  19. 19. Methodology Experiments In order to characterize the fault-tolerant nature of PGA we run two kind of experiments: a failure-free environment, and replaying and simulating failure traces from real-world DG platforms. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  20. 20. Methodology DG traces We perform simulations of DG platforms and of host availability based on three real-world traces: entrfin, ucb, xwtr. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  21. 21. Methodology DG traces Trace Hosts Venue Time Entrfin 275 San Diego 1.0 months Ucb 85 UC Berkeley 1.5 months Xwtr 100 ´ Univeriste Paris-Sud 1.0 months ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  22. 22. Methodology Using the traces We consider two cases: hosts that become unavailable never become available again (worst case assumption), and the complete host-churn (unavailable hosts can be re-acquired afterwards). For two different days of each trace. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  23. 23. Methodology Host availability for 1 day of the ucb trace 25 20 15 Computers 10 5 0 0 50 100 150 200 250 300 Time Step Original Trace Trace without return ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  24. 24. Experiments and Results ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  25. 25. Experiments and Results Problems We conduct experiments with a 3-trap instance: a → − → − → − trap(u( x )) = z (z − u( x )), if u( x ) ≤ z (1) b → − l−z (u( x ) − z), otherwise ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  26. 26. Experiments and Results GA Parameters for 3-Trap instance Trap instance Size of sub-function (k ) 3 Number of sub-functions (m) 10 Individual length (L) 30 GA settings GA GGA Population size 3000 Selection of Parents Binary Tournament Recombination Uniform crossover, pc = 1.0 1 Mutation Bit-Flip mutation, pm = L ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  27. 27. Experiments and Results Population size vs. generation 4000 0 3500 3000 25 2500 Individuals % of Loss 2000 50 1500 1000 75 500 0 100 0 10 20 30 40 50 Generations entrfin 1 ucb 1 xwtr 1 entrfin 2 ucb 2 xwtr 2 ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  28. 28. Experiments and Results Obtained Fitness for 3-Trap Day1 Error Free fitness = 23.56 Trace Fitness Wilcoxon Test Significantly different? Entrfin 23.30 W = 6093, p-value = 0.002688 yes Entrfin 10% 23.47 W = 5408.5, p-value = 0.2535 no Entrfin 20% 23.48 W = 5360, p-value = 0.3137 no Entrfin 30% 23.49 W = 5283.5, p-value = 0.4271 no Entrfin 40% 23.57 W = 4923.5, p-value = 0.8286 no Entrfin 50% 23.59 W = 4910.5, p-value = 0.7994 no Ucb 23.22 W = 6453, p-value = 6.877e-05 yes Ucb 10% 23.27 W = 6098.5, p-value = 0.002753 yes Ucb 20% 23.37 W = 5837.5, p-value = 0.02051 yes Ucb 30% 23.40 W = 5664, p-value = 0.06588 no Ucb 40% 23.51 W = 5186.5, p-value = 0.6004 no Ucb 50% 23.42 W = 5623, p-value = 0.08335 no Xwtr 23.56 W = 5056, p-value = 0.8748 no Xwtr 10% 23.57 W = 4923.5, p-value = 0.8286 no Xwtr 20% 23.68 W = 4474, p-value = 0.1245 no Xwtr 30% 23.73 W = 4259.5, p-value = 0.02812 yes Xwtr 40% 23.68 W = 4502, p-value = 0.1466 no Xwtr 50% 23.71 W = 4356.5, p-value = 0.05817 no ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  29. 29. Experiments and Results Obtained fitness for 3-Trap Day2 Error Free fitness = 23.56 Trace Fitness Wilcoxon Test Significantly different? Entrfin 23.57 W = 4979.5, p-value = 0.9546 no Entrfin 10% 23.69 W = 4397.5, p-value = 0.07682 no Entrfin 20% 23.67 W = 4522.5, p-value = 0.1645 no Entrfin 30% 23.70 W = 4405, p-value = 0.08086 no Entrfin 40% 23.69 W = 4453.5, p-value = 0.11 no Entrfin 50% 23.75 W = 4162.5, p-value = 0.01234 yes Ucb 23.09 W = 6672.5, p-value = 7.486e-06 yes Ucb 10% 23.12 W = 6826, p-value = 6.647e-07 yes Ucb 20% 23.14 W = 6654, p-value = 7.223e-06 yes Ucb 30% 23.26 W = 6371, p-value = 0.0001507 yes Ucb 40% 23.37 W = 5893.5, p-value = 0.01316 yes Ucb 50% 23.32 W = 6108, p-value = 0.002166 yes Xwtr 23.60 W = 4806, p-value = 0.5791 no Xwtr 10% 23.62 W = 4765, p-value = 0.5002 no Xwtr 20% 23.69 W = 4453.5, p-value = 0.11 no Xwtr 30% 23.60 W = 4806, p-value = 0.5791 no Xwtr 40% 23.63 W = 4688.5, p-value = 0.3695 no Xwtr 50% 23.77 W = 4065.5, p-value = 0.004877 yes ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  30. 30. Experiments and Results Obtained fitness with host-churn Table: Day1 Error Free fitness = 23.56 Trace Fitness Wilcoxon Test Significantly different? Entrfin 23.52 W = W = 5222, p-value = 0.5322 no Ucb 21.31 W = 9708.5, p-value < 2.2e-16 yes Xwtr 23.64 W = 4640, p-value = 0.2982 no Table: Day2 Error Free fitness = 23.56 Trace Fitness Wilcoxon Test Significantly different? Entrfin 23.58 W = 4931, p-value = 0.8452 no Ucb 23.03 W = 7038.5, p-value = 4.588e-08 yes Xwtr 23.7 W = 4405, p-value = 0.08086 no ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  31. 31. Conclusions ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  32. 32. Conclusions Summary of Results PGA applications are fault-tolerant by nature in DG platforms. PGA features the well-known fault-tolerant technique known as graceful degradation in DG platforms. We provided a new method to mitigate the effect of failures by increasing the initial population. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  33. 33. Conclusions Conclusions We have studied and characterized the behavior of PGA applications running in distributed platforms with high failure rates. We have tested the PGA fault-tolerance using three real-world DG traces. Our main conclusion is that PGA inherently provides graceful degradation. ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010
  34. 34. Conclusions Questions daniellg@unex.es juanlu@geneura.ugr.es fcofdez@unex.es jmerelo@geneura.ugr.es Icons from Tango Desktop project and Gnome Desktop (Creative Commons & GPL License) ˜ ´ D. Lombrana, JJ. Jimenez, F. Fernandez, JJ. Merelo Evocop 2010

×