Successfully reported this slideshow.

The Art of Evolutionary Algorithms Programming


Published on

Tutorial at on how to be more efficient writing more efficient programs that streamlines experiments and publication of results

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Art of Evolutionary Algorithms Programming

  1. 1. The art of Evolutionary Algorithms programming, by Dr. Juan-Julián Merelo, Esq. Calling from the University of Granada in the Old Continent of Europe
  2. 2. You suck
  3. 3. Don't worry I suck too
  4. 4. This is about sucking less At programming evolutionary algorithms , no less! Or any other, for that matter!
  5. 5. And about the zen
  6. 6. And about writing publishable papers painlessly through efficient, maintainable, scientific programming
  7. 7. 1. Mind your environment
  8. 8. Tools of the trade <ul><li>Use real operating systems.
  9. 9. Programmer's editor: kate, emacs, geany (your favorite editor here) + Debugger (gdb, language-specific debugger) </li><ul><li>Set up macros, syntax-highlighting and -checking, online debugging. </li></ul><li>IDE: Eclipse, NetBeans </li><ul><li>Steep learning curve, geared for Java, and too heavy on the resource usage side </li><ul><li>But worth the while eventually </li></ul></ul></ul>
  10. 10. 2. Open source your code and data
  11. 11. Aw, maaaan! <ul><li>Open source first, then program </li><ul><li>Scientific code should be born free </li></ul><li>Science must be reproducible
  12. 12. Easier for others to compare with your approach </li><ul><li>Increased H </li><ul><li>Heaven! </li></ul></ul><li>Manifest hidden assumptions.
  13. 13. If you don't share, you don't care! </li></ul>
  14. 14. 3. Minimize bugs via test-driven programming
  15. 15. Tests before code <ul><li>What do you want your code to do? </li><ul><li>Mutate a bit string, for instance. </li></ul><li>Write the test </li><ul><li>Is the result different from the original? </li><ul><li>Of course!
  16. 16. But will it be even if you change an upstream function? Or the representation? </li></ul><li>Does it change all bits in the same proportion?
  17. 17. Doest it follow (roughly) the mutation rate? </li></ul></ul>
  18. 18. Use testing frameworks <ul><li>Unit testing tries to catch bugs in the smallest atom of a program </li><ul><li>Interface, class, function, decision </li></ul><li>Every language has its testing framework </li><ul><li>PHPUnit, jUnit, xUnit, DejaGNU... </li></ul><li>Write tests for failure , not for success </li></ul>
  19. 19. 4. Control the source of your power
  20. 20. Source control systems save the day <ul><li>Source code management systems allow </li><ul><li>Checkpoints
  21. 21. Code repositories
  22. 22. Individual responsability over code changes
  23. 23. Branches </li></ul><li>Distributed are in : git, mercurial, bazaar
  24. 24. Centralized are out : subversion, cvs.
  25. 25. Instant backup! </li></ul>
  26. 26. Code complete <ul><li>Check out code/Update code
  27. 27. Make changes
  28. 28. Commit changes (and push to central repository) </li></ul><ul><li>Check out code/Update code
  29. 29. Make changes
  30. 30. Test your code
  31. 31. Commit changes (and push to central repository) </li></ul>
  32. 32. 5. Be language agnostic
  33. 33. Language shapes thought <ul><li>Don't believe the hype: </li><ul><li>Compiled languages are faster... NOT </li></ul><li>Avoid programming in C in every language you use
  34. 34. Consider DSL: Damn Small Languages
  35. 35. Python, Perl, Lua, Ruby, Javascript... interpreted languages are faster. </li></ul>
  36. 36. Language agnoticism at its best Evolving Regular Expressions for GeneChip Probe Performance Prediction The regular expresions are coded in AWK scripts: Although this may seem complex, gawk (Unix’ free interpreted pattern scanning and processing language) can handle populations of a million individuals.
  37. 37. Programming speed > program speed 6
  38. 38. Scientists, not software engineers <ul><li>Our deadlines are for papers – not for software releases
  39. 39. What should be optimized is speed-to-publish.
  40. 40. Make no sense to spend 90% time programming – 10 % writing paper
  41. 41. Scripting languages rock </li><ul><li>And maximize time-to-publish </li></ul></ul>
  42. 42. Perl faster than Java? Algorithm::Evolutionary, a flexible Perl module for evolutionary computation <ul><li>Class-by-class, Perl library much more compact </li><ul><li>Less code to write </li></ul><li>In pure EC code, Algorithm::Evolutionary was faster than ECJ </li></ul>
  43. 43. 10. Become an adept of the chosen language
  44. 44. Don't teach an old dog new tricks <ul><li>The first language you learn brands you with fire
  45. 45. But no two languages are the same
  46. 46. Every language includes very efficient solutions </li><ul><li>Regular expressions
  47. 47. Symbolic processing
  48. 48. Graphics </li></ul><li>Efficiency/knowledge balance </li></ul>
  49. 49. 11 Don't assume: measure
  50. 50. Performance matters <ul><li>Basic measure: CPU time as measured by time </li></ul><ul>jmerelo@penny:~/proyectos/CPAN/Algorithm-Evolutionary/benchmarks$ time perl 0; time: 0.003274 1; time: 0.005438 [...] 498; time: 1.006539 499; time: 1.00884 500; time: 1.010817 real 0m1.349s user 0m1.140s sys 0m0.050s </ul>
  51. 51. Going a bit deeper
  52. 52. Size matters
  53. 53. There's always a better algorithm/ data structure 12
  54. 54. And differences are huge <ul><li>Sort algorithms are an example </li><ul><li>Plus, do you need to sort the population? </li></ul><li>Cache fitness evaluations </li><ul><li>Cache them permanently in a database? </li><ul><li>Measure how much fitness evaluation takes </li></ul></ul><li>Thousand ways of computing fitness </li><ul><li>How do you compute the MAXONES? </li><ul><li>$fitness_of{$chromosome} = ($copy_of =~ tr/1/0/); </li></ul></ul><li>Algorithms and data structures interact </li></ul>
  55. 55. 13 Learn the tricks of the trade
  56. 56. Two trades <ul><li>Evolutionary algorithms </li><ul><li>Become one with your algorithm </li><ul><li>It does not work, but for a different reason that what you think it does </li></ul></ul><li>Programming languages </li><ul><li>What function is better implemented?
  57. 57. Is there yet another library to do sorting?
  58. 58. Where should you go if there's a problem? </li></ul><li>Even a third trade: programming itself. </li></ul>
  59. 59. Case study: sort <ul><li>Sorting is routinely used in evolutinary algorithms </li><ul><li>Roulette wheel, rank-based algorithms </li></ul><li>Faster sorts (in Perl): </li><ul><li>Sorting implies comparing
  60. 60. Orcish Manoeuver, Schwartzian transform
  61. 61. Sort::Key, fastest ever </li></ul><li>Do you even need sorting? </li><ul><li>Top or bottom n will do nicely sometimes </li></ul></ul>
  62. 62. Make output processing easy 14
  63. 63. Avoid drowning in data <ul><li>Every experiment produces megabytes of data </li><ul><li>Timestamps, vectors, arrays, hashes
  64. 64. Difficult to understand a time from production </li></ul><li>Use serialization languages for storing data
  65. 65. YAML: Yet another markup language
  66. 66. JSON: Javascript Object Notation
  67. 67. XML: eXtensible Markup languajge
  68. 68. Name your own </li></ul>
  69. 69. Case study: Mastermind <ul>Entropy-Driven Evolutionary Approaches to the Mastermind Problem Carlos Cotta et al., <li>Output uses YAML
  70. 70. Includes </li><ul><li>Experiment parameters
  71. 71. Per-run and per-generation data
  72. 72. Final population and run time </li></ul><li>Open source! </li></ul>
  73. 73. When everything fails visualize
  74. 74. 15 to 25 backup your data
  75. 75. Better safe than unpublished <ul><li>Get an old computer, and backup everything there.
  76. 76. In some cases, create virtual machines to reproduce one paper's environment </li><ul><li>Do you think gcc 3.2.3 will work on your new one? </li></ul><li>Use rsync, bacula or simply cp
  77. 77. It's not if your hard disk will fail, it's when
  78. 78. Cloud solutions are OK, but backup that too </li></ul>
  79. 79. 26 Keep stuff together
  80. 80. Where did I left my keys? <ul><li>Paper: program + data + graphics + experiment logs + text + revisions + referee reports + presentations
  81. 81. Experiments have to be rerun, graphics replotted, papers rewritten
  82. 82. Use logs to know which parameters produced which data which produced that graph
  83. 83. And put them all in the same directory tree </li></ul>
  84. 84. Get out and smell the air 27
  85. 85. New is always better <ul><li>Programming paradigms are changing on a daily basis
  86. 86. NoSQL, cloud computing, internet of things, Google Prediction API, map/reduce, GPGPU
  87. 87. Keep a balance between following fashions and finding more efficient ways of doing evolutionary algorithms. </li><ul><li>And by doing I mean publishing </li></ul></ul>
  88. 88. Nurture your code 28
  89. 89. A moment of joy, a lifetime of grief <ul><li>Run tests periodically, or when there is a major upgrade of interpreter or upstream library. </li><ul><li>Can be automated </li></ul><li>Maintain a roadmap of releases </li><ul><li>Remember this is free software </li></ul><li>Get the community involved </li><ul><li>Your research is for the whole wide world. </li></ul></ul>
  90. 90. Publish, don't perish!
  91. 91. (no cats were harmed doing this presentation) Or camels!