Fast, Effective Genetic Algorithms for Large, Hard Problems

3,607 views

Published on

Tutorial by David E. Goldberg at 2009 ACM Genetic and Evolutionary Computation Summit in Shanghai, China.

Published in: Technology, Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,607
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
545
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Honor Escaping Hierarchical Traps with Competent GAs
  • Fast, Effective Genetic Algorithms for Large, Hard Problems

    1. 1. Fast, Effective GAs for Large, Hard Problems David E. Goldberg Illinois Genetic Algorithms Laboratory University of Illinois at Urbana-Champaign Urbana, IL 61801 USA Email: [email_address] ; Web: http://www.illigal.uiuc.edu
    2. 2. GAs Had Their Warhol 15, Right? <ul><li>Evolution timeless, GAs so 90s. </li></ul><ul><li>First-generation GA results were mixed in practice. </li></ul><ul><li>Sometimes worked, sometimes not & first impressions stuck. </li></ul><ul><li>But GAs had legs. </li></ul><ul><li>In 90s, logical continuation of GA thinking has led to </li></ul><ul><ul><li>Completion of theory in certain sense, </li></ul></ul><ul><ul><li>& GAs that solve large, hard problems quickly, reliably, and accurately. </li></ul></ul><ul><li>Consider design theory and designs that have led to reliable solution of difficult problems. </li></ul>Andy Warhol (1928-1987)
    3. 3. Roadmap <ul><li>One-minute genetic algorithmist. </li></ul><ul><li>The unreasonableness of GAs. </li></ul><ul><li>Airplane & toaster design: A lesson from the Wright Brothers. </li></ul><ul><li>Goals of GA design. </li></ul><ul><li>Design decomposition step by step. </li></ul><ul><li>Competent GA design from the fast messy GA to hBOA. </li></ul><ul><li>From competence to efficiency: When merely fast is not enough. </li></ul><ul><li>A billion bits or bust. </li></ul>
    4. 4. One-Minute Genetic Algorithmist <ul><li>What is a GA? </li></ul><ul><li>Solutions as chromosomes. </li></ul><ul><li>Means of evaluating fitness to purpose. </li></ul><ul><li>Create initial population. </li></ul><ul><li>Apply selection and genetic operators: </li></ul><ul><ul><li>Survival of the fittest. </li></ul></ul><ul><ul><li>Mutation </li></ul></ul><ul><ul><li>Crossover </li></ul></ul><ul><li>Repeat until good enough. </li></ul><ul><li>Puzzle: operators by themselves uninteresting. </li></ul>
    5. 5. Selection <ul><li>Darwinian survival of the fittest. </li></ul><ul><li>Give more copies to better guys. </li></ul><ul><li>Ways to do: </li></ul><ul><ul><li>roulette wheel </li></ul></ul><ul><ul><li>tournament </li></ul></ul><ul><ul><li>truncation </li></ul></ul><ul><li>Gedanken experiment: Run repeatedly without crossover or mutation. </li></ul><ul><li>By itself, pick best. </li></ul>
    6. 6. Crossover <ul><li>Combine bits and pieces of good parents. </li></ul><ul><li>Speculate on, new, possibly better children. </li></ul><ul><li>Gedanken experiment: 50, 11111 & 50, 00000. </li></ul><ul><li>By itself, a random shuffle </li></ul><ul><li>Example </li></ul>Before X After X 11111 11000 00000 00111
    7. 7. Mutation <ul><li>Mutation is a random alteration of a string. </li></ul><ul><li>Change a gene, small movement in the neighborhood. </li></ul><ul><li>Gedanken experiment: 100, 11111. </li></ul><ul><li>By itself, a random walk. </li></ul><ul><li>Example </li></ul>Before M After M 11111 11011
    8. 8. The Unreasonableness of GAs <ul><li>How do individually uninteresting operators yield interesting behavior? </li></ul><ul><li>Others talk about emergence. </li></ul><ul><li>1983 innovation intuition: Genetic algorithm power like that of human innovation. </li></ul><ul><li>Separate </li></ul><ul><ul><li>Selection + mutation as hillclimbing or kaizen. </li></ul></ul><ul><ul><li>Selection + recombination  Let’s examine. </li></ul></ul><ul><li>Different modes or facets of innovation or invention. </li></ul>
    9. 9. Selection + Recombination = Innovation <ul><li>Combine notions to form ideas. </li></ul><ul><li>It takes two to invent anything. The one makes up combinations; the other chooses, recognizes what he wishes and what is important to him in the mass of the things which the former has imparted to him . P. Valéry </li></ul>Paul Valéry (1871-1945)
    10. 10. Airplane & Toaster Design <ul><li>Airport story. </li></ul><ul><li>Why do the rules change? </li></ul><ul><li>Legacy of Descartes: Separation of mind and body. </li></ul><ul><li>Material machines (airplanes, toasters, autos, etc.) vs. conceptual machines (GAs, neural nets, computer programs, algorithms). </li></ul><ul><li>Design is design is design. </li></ul>
    11. 11. Two Bicycle Mechanics from Ohio <ul><li>Four years, 1899-1903. </li></ul><ul><li>Three gliders. </li></ul><ul><li>Orville and Wilbur Wright created powered flight from scratch. </li></ul><ul><li>Query: Why were the Wright brothers the first to fly? </li></ul>
    12. 12. Hypotheses <ul><li>Wrights flew because they were bicycle mechanics. </li></ul><ul><li>Wrights flew because it was part of the zeitgeist. </li></ul><ul><li>Wrights flew because they were bachelors! </li></ul><ul><li>Maybe the Wrights flew because they were better inventors. </li></ul>
    13. 13. December 17, 1903: The Most Famous Moment in Aviation History
    14. 14. The Wright Brothers’ Secret <ul><li>Functional decomposition. </li></ul><ul><li>Three subproblems: </li></ul><ul><ul><li>Stability : wing-warping plus elevator in 1899 glider model. 1902 glider had three-axis active control. </li></ul></ul><ul><ul><li>Lift and Drag : wing shape improved on Lilenthal’s through air tunnel experiments. </li></ul></ul><ul><ul><li>Propulsion : rotary wing with forward lift is a propeller. </li></ul></ul>
    15. 15. But Decomposition is Old Hat to Moderns <ul><li>Computer science is about one thing: busting big problems up into lots of little problems. </li></ul><ul><li>Descartes’s theory of decomposition (1637). Discourse on Method of Rightly Conducting the Reason and Seeking Truth in the Sciences. </li></ul><ul><li>What else distinguishes Wrights’ method of invention? Method of modeling and integration was different. </li></ul>
    16. 16. Lessons of the Wright Brothers <ul><li>Effective design decomposition of your problem. </li></ul><ul><li>Facetwise, economic models of subproblem facets. </li></ul><ul><li>Bounding empirical study and calibration. </li></ul><ul><li>Scaling laws (dimensional analysis) particularly important. </li></ul>
    17. 17. Goals of GA Design <ul><li>Solve </li></ul><ul><ul><li>hard problems, </li></ul></ul><ul><ul><li>quickly, </li></ul></ul><ul><ul><li>accurately, </li></ul></ul><ul><ul><li>and reliably. </li></ul></ul><ul><li>Call a GA that achieves these goals competent. </li></ul><ul><li>Can we design competent GAs? </li></ul>
    18. 18. Effective Theory in GA Design <ul><li>Many GAs don’t scale & much GA theory inapplicable. </li></ul><ul><li>Need design theory that works: </li></ul><ul><ul><li>Understand building blocks (BBs), notions or subideas. </li></ul></ul><ul><ul><li>Ensure BB supply. </li></ul></ul><ul><ul><li>Ensure BB growth. </li></ul></ul><ul><ul><li>Control BB speed. </li></ul></ul><ul><ul><li>Ensure good BB decisions. </li></ul></ul><ul><ul><li>Ensure good BB mixing (exchange). </li></ul></ul><ul><ul><li>Know BB challengers. </li></ul></ul><ul><li>Can use theory to design scalable & efficient GAs. </li></ul>
    19. 19. Play the GA Game <ul><li>Give you a population of strings S i . </li></ul><ul><li>Give a list of associate f i values (bigger is better). </li></ul><ul><li>Ask you to create a better string. </li></ul><ul><li>Blind: no equation relating f i and S i . </li></ul>
    20. 20. What Are We Processing? <ul><li>Similarities among strings. </li></ul><ul><li>Schemata are similarity subsets. </li></ul><ul><li>Schemata described by similarity templates. </li></ul><ul><li>Example: *1*** = {strings with 1 second position}. </li></ul><ul><li>Population contains 2 l - n 2 l schemata . </li></ul>
    21. 21. Schema Theorem (Holland 1975) <ul><li>where </li></ul><ul><ul><li>f - fitness </li></ul></ul><ul><ul><li>H - schema </li></ul></ul><ul><ul><li>M - number of schema representatives </li></ul></ul><ul><ul><li> - defining length </li></ul></ul><ul><ul><li>o - schema order </li></ul></ul><ul><ul><li>P c - probability of crossover </li></ul></ul><ul><ul><li>P m - probability of mutation </li></ul></ul><ul><ul><li>l - string length </li></ul></ul><ul><li>Little schemata grow logistically. </li></ul><ul><li>A necessary condition for BB growth. </li></ul>
    22. 22. Practical Schema Theorem for Design <ul><li>Fitness multiplier = s. </li></ul><ul><li>Overall disruption =  . </li></ul><ul><li>Goldberg & Sastry, 2001 </li></ul>
    23. 23. Problem Difficulty <ul><li>There are hard problems & easy problems. </li></ul><ul><li>3 way decomposition. </li></ul><ul><li>The core: </li></ul><ul><ul><li>deception - intra </li></ul></ul><ul><ul><li>scaling - inter </li></ul></ul><ul><ul><li>noise - extra </li></ul></ul>
    24. 24. Easy Problems & Hard Problems <ul><li>The OneMax problem: </li></ul><ul><li>Linear, uniformly scaled </li></ul><ul><li>Define u, unitation variable </li></ul><ul><ul><li># or ones in binary string </li></ul></ul><ul><li>Needle-in-a-haystack (NIAH) problem: </li></ul><ul><li>No regularity to infer where good solutions might lie. </li></ul><ul><li>Nothing does better than enumeration or random search. </li></ul>
    25. 25. Designing a Harder Problem <ul><li>Low-order estimates mislead GA </li></ul><ul><li>x * = 111 : f 111 > f i , i  111 . </li></ul><ul><li>Require complementary schemata better than competitors. </li></ul><ul><li>Squashed Hamming cube representation: </li></ul>
    26. 26. 4-bit Deceptive Trap
    27. 27. Good Decisions: 2- & k -Armed Bandit <ul><li>Competing order-one schemata form a two- armed bandit. Example: 0**** versus 1****. </li></ul><ul><li>Exponentially increasing trials to the observed best. </li></ul><ul><li>fff**, a 2 3 = 8-armed bandit. </li></ul><ul><li>Many bandit problems played in parallel. </li></ul>
    28. 28. Gambler’s Ruin Population Size <ul><li>Make P bb = Q and solve for n : </li></ul><ul><li>  = 1 - Q </li></ul><ul><li>In terms of signal and noise: </li></ul><ul><li>Compare with populationwise pop-sizing equation: </li></ul>
    29. 29. 100-bit Onemax
    30. 30. The Complexity Temptation <ul><li>First complexity results for GAs. </li></ul><ul><li>Calculate the W=nt </li></ul><ul><li>Function evaluations as product of population size and run duration (single epoch. </li></ul>
    31. 31. A Sense of Time <ul><li>Truncation selection: make s copies each of top 1/ s th of the population. </li></ul><ul><li>P(t+1) = s P(t) until P(t) = 1 </li></ul><ul><li>P(t) = s t P(0) </li></ul><ul><li>Solve for takeover time t *: time to go from one good guy to all good guys (or all but one). </li></ul><ul><li>t * = ln n / ln s </li></ul>
    32. 32. So What? <ul><li>Who cares about selection alone? </li></ul><ul><li>I want to analyze a “real GA”. </li></ul><ul><li>How can selection-only analysis help me? </li></ul><ul><li>Answer: Imagine another characteristic time, the innovation or mixing time. </li></ul>
    33. 33. The Innovation Time, t i <ul><li>Innovation time is the average time to create an individual better than one so far. </li></ul><ul><li>Under crossover imagine pi, the probability of recomb event creating better individual. </li></ul><ul><li>Innovation probability in Goldberg, Deb & Thierens (1993) and Thierens & Goldberg (1993). </li></ul>
    34. 34. Schematic of the Race
    35. 35. Golf Clubs Have Sweet Spots <ul><li>So do GAs. </li></ul><ul><li>Easy problems, big sweet spots. </li></ul><ul><li>Monkey can set GA parameters. </li></ul><ul><li>Hard problems, vanishing sweet spots. </li></ul>[Goldberg, Deb, & Theirens, 1993]
    36. 36. Dr. Jekyll & Mr. Hyde in Practice <ul><li>GA literature full of evidence for this problem. </li></ul><ul><li>Evolution of the “typical practitioner.” </li></ul><ul><ul><li>First application goes swimmingly. </li></ul></ul><ul><ul><li>More complex application needs TLC. </li></ul></ul><ul><ul><li>Big Kahuna needs compute time = length of universe. </li></ul></ul><ul><li>Why are we fiddling with codings and operators? </li></ul><ul><li>Aren’t GAs robust? </li></ul><ul><li>No. First-generation GAs are not. </li></ul>
    37. 37. Simple GAs Are Mixing Limited <ul><li>With growing difficulty, “sweet spot” vanishes. </li></ul><ul><li>Or populations must grow exponentially. </li></ul>
    38. 38. The Key: Not the Schema Theorem <ul><li>Much theory focuses on Holland’s schema theorem. </li></ul><ul><li>Schema theorem a piece of cake. </li></ul><ul><li>Make sure GA fires on all seven cylinders of the design decomposition. </li></ul><ul><li>Surprise: Mixing is the key. </li></ul><ul><li>To mix well, must get building blocks right. </li></ul><ul><li>Effective GAs identify structure of problem. </li></ul>Data mine early samples for structure of the landscape.
    39. 39. GA Kitty Hawk: 1993 <ul><li>1993 & the fast messy GA. </li></ul><ul><li>Moveable bits, cutting and splicing, building-block filtering mechanism. </li></ul><ul><li>Original mGA complexity estimated: O( l 5 ) </li></ul><ul><li>Compares favorably to hillclimbing, too (Muhlenbein 1992). </li></ul>[Goldberg, Deb, Kargupta, & Harik, 1993]
    40. 40. Look Ma, No Genetics: hBOA <ul><li>Replace genetics with probabilistic model building  PMBGA or estimation of distribution algorithm: EDA </li></ul><ul><li>3 main elements: </li></ul><ul><ul><li>Decomposition (structural learning): Learn what to mix and what to keep intact. </li></ul></ul><ul><ul><li>Representation of BBs (chunking): Means of representing alternative solutions. </li></ul></ul><ul><ul><li>Diversification of BBs (niching): Preserve alternative chunks of solutions. </li></ul></ul><ul><li>Test on adversarially designed functions so works on yours. </li></ul>
    41. 41. Schematic of BOA Structure Current population Selection New population Bayesian network
    42. 42. Results on Spin Glasses Pelikan et al. (2002) 64 100 144 196 256 324 400 10 3 Problem Size Number of Evaluations hBOA O(n 3.51 )
    43. 43. From Competence to Efficiency <ul><li>Motivation: Even competent GAs require O ( I 2 ) time. </li></ul><ul><li>1000*1000 = a million function evaluations. </li></ul><ul><li>In real problems, this can be a problem. </li></ul><ul><li>How can we systematically achieve speedups. </li></ul>
    44. 44. IlliGAL Decomposition of Efficiency <ul><li>1. Space: parallelization </li></ul><ul><li>2. Time: continuation </li></ul><ul><li>3. Fitness: Evaluation relaxation </li></ul><ul><li>4. Specialization: Hybridization </li></ul>
    45. 45. <ul><li>Computation time: </li></ul><ul><li>Communications time: </li></ul><ul><li>Less computations, more communications </li></ul>Master-Slave Parallel GAs
    46. 46. Account for Time (and Quality) <ul><li>Use perspective of the master </li></ul><ul><li>Minimize time: </li></ul>
    47. 47. Master-Slave Example <ul><li>Dummy function T f = 4 ms </li></ul><ul><li>Communications time T c = 19ms </li></ul><ul><li>Pop size: 120, length = 80 </li></ul>Cantu-Paz, E. and Goldberg, D. E.(1999). On the scalability of parallel genetic algorithms, Evolutionary Computation, 7 (4), 429-449.)
    48. 48. Speedups and Efficiency
    49. 49. My Dr. Evil Moment <ul><li>Lunchtime question: do real large problems draw attention to theoretical & design findings? </li></ul><ul><li>Dr. Evil’s mistake: Wondered if GAs could go to 10 6 vars. </li></ul><ul><li>Decided to go for a billion. </li></ul><ul><li>Use simple underlying problem (OneMax) with Gaussian noise (0.1 variance of deterministic problem) </li></ul><ul><li>Don’t try this at home!!! </li></ul>We get the warhead and then hold the world ransom for... 1 MILLION DOLLARS !
    50. 50. Road to Billion Paved with Speedups <ul><li>Naïve implementation: 100 terabytes & 2 72 random number calls. </li></ul><ul><li>cGA  Memory O(ℓ) v. O(ℓ 1.5 ). </li></ul><ul><li>Parallelization  speedup n p . </li></ul><ul><li>Vectorize four bits at a time  speedup 4. </li></ul><ul><li>Other doodads (bitwise ops, limit flops, inline fcns, precomputed evals)  speedup 15. </li></ul><ul><li>Gens & pop size scale as expected. </li></ul>
    51. 51. A Billion Bits or Bust <ul><li>Simple hillclimber solves 1.6(10 4 ) or (2 14 ). </li></ul><ul><li>Souped-up cGA solves 33 million (2 25 ) to full convergence. </li></ul><ul><li>Solves 1.1 billion (2 30 ) with relaxed convergence. </li></ul><ul><li>Growth rate the same  Solvable to convergence. </li></ul>
    52. 52. Design Fast, Effective GAs <ul><li>GA design advanced by taking GA ideas and running with them. </li></ul><ul><li>Large, difficult problems in grasp. </li></ul><ul><li>Theory and practice in sync. </li></ul><ul><li>These direct lessons are crucial. </li></ul><ul><li>Meta-lessons of this style of thinking as important for complex systems & interdisciplinary work, generally. </li></ul><ul><li>This style of theory works for all GEC. </li></ul>
    53. 53. More Information <ul><li>Goldberg, D. E. (2002). The design of innovation: Lessons from and for competent genetic algorithms. Boston, MA: Kluwer Academic Publishers. </li></ul><ul><li>Lab: www.illigal.org </li></ul><ul><li>iFoundry: www.ifoundry.illinois.edu </li></ul><ul><li>Email: [email_address] </li></ul>

    ×