A Billion Bits or Bust

10,684 views

Published on

Describes motivation, research, and results at University of Illinois to solve a billion-bit genetic algorithm with subquadratic scalability.

Published in: Business, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,684
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
2
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

A Billion Bits or Bust

  1. 1. A Billion Bits or Bust David E. Goldberg 1 , Kumara Sastry 1,2 , Xavier Llor à 1, 3 1 Illinois Genetic Algorithms Laboratory (IlliGAL) 2 Materials Computation Center (MCC) 3 National Center for Super Computing Applications (NCSA) University of Illinois at Urbana-Champaign Urbana, IL 61801 USA [email_address] , [email_address] , [email_address] http://www- illigal.ge.uiuc.edu Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.
  2. 2. Billion-Bit Optimization? <ul><li>Strides w/ genetic algorithm (GA) theory/practice in 1990s. </li></ul><ul><ul><li>Solving large, hard problems in principled way. </li></ul></ul><ul><ul><li>Moving to practice in important problem domains. </li></ul></ul><ul><li>Still GA boobirds claim (1) no theory, (2) too slow, and (3) just voodoo. </li></ul><ul><li>How demonstrate results achieved so far in dramatic way? Billion bits or bust whitepaper (10/23/05) . </li></ul><ul><li>DEG lunch questions: A million? Sure. A billion? Maybe. </li></ul><ul><li>Naïve GA implementation for a billion bits: </li></ul><ul><ul><li>~100 terabytes memory for population storage. </li></ul></ul><ul><ul><li>~2 72 random number calls. </li></ul></ul><ul><li>Naïve approach goes nowhere. </li></ul>
  3. 3. Roadmap <ul><li>What is a genetic algorithm? </li></ul><ul><li>1980s: Trouble in River City. </li></ul><ul><li>1990s: IlliGAL solves robustness problems. </li></ul><ul><li>2000s: Toward billion-variable optimization </li></ul><ul><li>Why this matters in practice: Example, multiscale materials and chemistry modeling </li></ul><ul><li>Challenges to using this in industry. </li></ul><ul><li>Efficiency enhancement in 4-part harmony. </li></ul><ul><li>Supermultiplicative speedups by mining evolved data. </li></ul>
  4. 4. What is a Genetic Algorithm (GA)? <ul><li>Search procedure based on mechanics of natural selection and genetics . </li></ul><ul><li>Representation: GAs operate on codes: </li></ul><ul><ul><li>Binary or Gray </li></ul></ul><ul><ul><li>Permutation </li></ul></ul><ul><ul><li>Integer or real </li></ul></ul><ul><ul><li>Program </li></ul></ul><ul><li>Fitness function: Objective function, subjective function, ecological co-evolution </li></ul><ul><li>Population: Candidate solutions (individuals) </li></ul><ul><li>Genetic operators: </li></ul><ul><ul><li>Selection: “Survival of the fittest” </li></ul></ul><ul><ul><li>Recombination: Combine parental traits to create offspring </li></ul></ul><ul><ul><li>Mutation: Modify an offspring slightly </li></ul></ul>
  5. 5. 1980s: Trouble in River City <ul><li>1980s: Promise of robustness not realized. </li></ul><ul><li>First generation performance uncertain: </li></ul><ul><ul><li>Sometimes effective, sometimes not. </li></ul></ul><ul><ul><li>Sometimes fast, sometimes slow. </li></ul></ul><ul><li>How can we make GAs scalable? Competence. </li></ul><ul><li>How can we make them practical? Efficiency </li></ul>
  6. 6. 1990s: Competent and Efficient GAs <ul><li>90s: IlliGAL solves robustness puzzle. </li></ul><ul><li>Competence: Solve hard problems quickly, reliably, and accurately ( Intractable to tractable ). </li></ul><ul><li>Efficiency: Develop speedup procedures ( tractability to practicality ). </li></ul><ul><li>Principled design : [ Goldberg, 2002 ] </li></ul><ul><ul><li>Relax rigor, emphasize scalability/quality. </li></ul></ul><ul><ul><li>Use problem decomposition. </li></ul></ul><ul><ul><li>Use facetwise models, and patchquilt integration using dimensional analysis. </li></ul></ul>
  7. 7. Competent GAs Then: 1993, GA Kitty Hawk <ul><li>Early 90s the facetwise theory came together. </li></ul><ul><li>Early 90s first competent GAs appeared. </li></ul><ul><li>First competent GA in 1993: fast messy GA (fmGA). </li></ul><ul><li>Original mGA complexity estimated: O( l 5 ) </li></ul><ul><li>fmGA subquadratic on deceptive problem. </li></ul>[Goldberg, Deb, Kargupta, & Harik, 1993]
  8. 8. Competent GAs Now: hBOA <ul><li>Replace genetics with probabilistic model building: PMBGA or EDA. </li></ul><ul><li>3 main elements: </li></ul><ul><ul><li>Decomposition (structural learning) </li></ul></ul><ul><ul><ul><li>Learn what to mix and what to keep intact </li></ul></ul></ul><ul><ul><li>Representation of BBs (chunking) </li></ul></ul><ul><ul><ul><li>Means of representing alternative solutions </li></ul></ul></ul><ul><ul><li>Diversification of BBs (niching) </li></ul></ul><ul><ul><ul><li>Preserve alternative chunks of solutions </li></ul></ul></ul><ul><li>Combines selectionist approach of GAs, statistical methods, and machine learning. </li></ul>[US utility patent # 7,047,169]
  9. 9. Hierarchical BOA (hBOA) Facetwise model Current population Selection New population Bayesian network
  10. 10. hBOA on Hard Antenna Designs [Yu, Santarelli, & Goldberg, 2006]
  11. 11. Aiming for a Billion <ul><li>Theory & algorithms in place. </li></ul><ul><li>Needed the guts to try. </li></ul><ul><li>Focus on key theory, implementation, & efficiency enhancements. </li></ul><ul><li>Theory keys : </li></ul><ul><ul><li>Problem difficulty. </li></ul></ul><ul><ul><li>Parallelism. </li></ul></ul><ul><li>Implementation key : compact GA. </li></ul><ul><li>Efficiency keys : </li></ul><ul><ul><li>Various speedup. </li></ul></ul><ul><ul><li>Memory savings. </li></ul></ul><ul><li>Results on a billion-variable noisy OneMax. </li></ul>
  12. 12. Theory Key 1: Master-Slave  Linear Speedup <ul><li>Speed-up: </li></ul><ul><li>Max speed-up at </li></ul>[Cantu-Paz & Goldberg, 1997; Cantú -Paz, 2000 ] <ul><li>Near linear speed-up until </li></ul>
  13. 13. Theory Key 2: Noise Covers Most Problems <ul><li>Adversarial problem design [Goldberg, 2002] </li></ul><ul><li>Blind noisy OneMax </li></ul>P Fluctuating Deception Noise Scaling R
  14. 14. Implementation Key: Compact GA <ul><li>Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; M ü hlenbein & Paa ß , 1996] </li></ul><ul><li>Represent population by probability vector </li></ul><ul><ul><li>Probability that i th bit is 1 </li></ul></ul><ul><li>Replace recombination with probabilistic sampling </li></ul><ul><li>Selectionist scheme </li></ul><ul><li>New population evolution through probability updates </li></ul><ul><li>Equivalent to GA with steady-state tournament selection and uniform crossover </li></ul>
  15. 15. Compact Genetic Algorithm (cGA) <ul><li>Random initialization: Set probabilities to 0.5 </li></ul><ul><li>Model Sampling: Generate two candidate solutions by sampling the probability vector </li></ul><ul><li>Evaluation: Evaluate the fitness of two sampled solutions </li></ul><ul><li>Selection: Select the best among the sampled solutions </li></ul><ul><li>Probabilistic model update: Increase the proportion of winning alleles by 1/n </li></ul>
  16. 16. Parallel cGA Architecture Processor #n p Sample bits 1-  /n p Select best individual Update probabilities Collect partial sampled solutions and combine Parallel fitness evaluation of sampled solutions Broadcast fitness values of sampled solutions Processor #1 Sample bits 1-  /n p Select best individual Update probabilities Processor #2 Sample bits 1-  /n p Select best individual Update probabilities
  17. 17. cGA is Memory Efficient: Θ (  ) vs. Θ (  1.5 ) <ul><li>Orders of magnitude memory savings via efficient GA </li></ul><ul><li>Example: ~32 MB per processor on a modest 128 processors for billion-bit optimization </li></ul><ul><li>Simple GA: </li></ul><ul><li>Compact GA: </li></ul><ul><ul><li>Frequencies instead of probabilities (4 bytes) </li></ul></ul><ul><ul><li>Parallelization reduces memory per processor by factor of n p </li></ul></ul>
  18. 18. Vectorization Yields Speedup of 4 <ul><li>SIMD instruction set allows vector operations on 128-bit registers </li></ul><ul><li>Equivalent to 4 processors per processor </li></ul><ul><li>Vectorize costly code segments with AltiVec/SSE2 </li></ul><ul><ul><li>Generate 4 random numbers at a time </li></ul></ul><ul><ul><li>Sample 4 bits at a time </li></ul></ul><ul><ul><li>Update 4 probabilities at a time </li></ul></ul>
  19. 19. Other Efficiencies Yield Speedup of 15 <ul><li>Bitwise operations </li></ul><ul><li>Limited floating-point operations </li></ul><ul><li>Inline functions </li></ul><ul><li>Avoid using mod and division operations </li></ul><ul><li>Precomputing bit sums and indexing </li></ul><ul><li>Parallel, vectorized, and efficient GA: </li></ul><ul><ul><li>Memory scales as  (  / n p ); Speedup scales as 60 n p </li></ul></ul><ul><ul><li>~32 MB memory, and ~10 4 speedup with 128 processors </li></ul></ul>
  20. 20. GA Population Sizing <ul><li>Additive Gaussian noise with variance  2 N </li></ul><ul><li>Population sizing scales: O( m 0.5 log m ) – O( m log m ) </li></ul>[Harik, et al, 1997] Noise-to-fitness variance ratio Error tolerance Signal-to-Noise ratio # Competing sub-components # Components (# BBs)
  21. 21. GA Convergence Time <ul><li>Convergence time scales: O( m 0.5 ) – O( m ) </li></ul><ul><li>GA scales as: O( m log m ) – O( m 2 log m ) </li></ul>[Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002] Selection Intensity Problem size ( m · k )
  22. 22. GA Solves Billion-Variable Optimization Problem <ul><li>Solved 33 million (2 25 ) bit problem to optimality. </li></ul><ul><li>Solved 1.1 billion (2 30 ) bit problem with relaxed, but guaranteed convergence </li></ul>GA scales  (  ¢ log  ¢ (1+  2 N /  2 f ))
  23. 23. Do Problems Like This Matter? <ul><li>Yes, for three reasons: </li></ul><ul><ul><li>Many GAs no more sophisticated than cGA. </li></ul></ul><ul><ul><li>Inclusion of noise was important because it covers all difficulty (except deception). </li></ul></ul><ul><ul><li>Know how to handle deception and other problems through PMBGAs like hBOA. </li></ul></ul><ul><li>Have experience solving tough problems with ordinary genetic and evolutionary algorithms: </li></ul><ul><ul><li>Material science. </li></ul></ul><ul><ul><li>Chemistry. </li></ul></ul><ul><li>Complex versions of these kinds of problems need billion-bit optimization. </li></ul>
  24. 24. Multiscale Nanomaterials Modeling <ul><li>Accuracy of modeling depends on accurate representation of potential energy surface (PES) </li></ul><ul><ul><li>Both values and shape matter </li></ul></ul><ul><li>Ab initio methods: </li></ul><ul><ul><li>Accurate, but slow (hours-days) </li></ul></ul><ul><ul><li>Compute PES from scratch </li></ul></ul><ul><li>Faster methods: </li></ul><ul><ul><li>Fast (seconds-minutes), accuracy depends on PES accuracy </li></ul></ul><ul><ul><li>Need direct/indirect knowledge of PES </li></ul></ul><ul><li>Known and unknown potential function/method </li></ul><ul><ul><li>Multiscaling quantum chemistry simulations [ Sastry et al, 2006 ] </li></ul></ul><ul><ul><li>Multi-timescaling alloy kinetics [ Sastry et al, 2004 ; Sastry et al, 2005 ] </li></ul></ul>
  25. 25. <ul><li>Molecular dynamics (MD): (~10 –9 secs) many realistic processes are inaccessible. </li></ul><ul><li>Kinetic Monte Carlo (KMC): (~secs) need all diffusion barriers a priori . ( God or compute ) </li></ul><ul><li>Efficient Coupling of MD and KMC </li></ul><ul><ul><li>Use MD to get some diffusion barriers . </li></ul></ul><ul><ul><li>Use KMC to span time . </li></ul></ul><ul><ul><li>Use GP to regress all barriers from some barrier info. </li></ul></ul><ul><li>Span 10 –15 seconds to seconds ( 15 orders of magnitude ) </li></ul>Multi-timescale Modeling of Alloys <ul><li>chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology , 12(9), 2005 </li></ul>Real time Complexity Table Lookup KMC On the fly KMC Symbolically Regressed KMC (sr-KMC)
  26. 26. <ul><li> E calculated: » 3% (256) configurations </li></ul><ul><li>Low-energy events: <0.1% prediction error </li></ul><ul><li>Overall events: <1% prediction error </li></ul>GP-Enhanced Kinetics Modeling <ul><li>Total 2 nd n.n. Active configurations: 8192 </li></ul><ul><li>Dramatic scaling over MD (10 9 at 300 K) </li></ul><ul><li>10 2 decrease in CPU time for calculating barriers </li></ul><ul><li>10 3 -10 6 less CPU time than on-they-fly KMC </li></ul>
  27. 27. Chemistry: GA-Enhanced Reaction Dynamics <ul><li>Accurate excited-state surfaces with semiempirical methods </li></ul><ul><ul><li>Permits dynamics of larger systems: proteins, nanotubes, etc., </li></ul></ul><ul><li>Energy & shape of the PES matter </li></ul><ul><li>Uknown PES functional form: Multi-timescaling alloy kinetics [ Sastry et al, 2004 ; Sastry et al, 2005 ] </li></ul>Accurate but slow (hours-days) Can calculate excited states Fast (secs.-mins.), accuracy depends on parameters. Calculate integrals from fit parameters. Ab Initio Quantum Chemistry Methods Semiempirical Methods Tune Semiempirical Parameters [ Best paper and Silver “Humies” award . GECCO (ACM SIG conference) ]
  28. 28. MOGA Finds Physical and Accurate PES <ul><li>MOGA results have significantly lower errors than current results. </li></ul><ul><li>Globally accurate PES yielding physical reaction dynamics </li></ul>Each point is a set of 11 parameters <ul><li>10 2 -10 5 increase in simulation time </li></ul><ul><li>10-10 3 times faster than current methods </li></ul><ul><li>Produces transferable SE parameters </li></ul><ul><li>Interpretable semiempirical methods </li></ul>
  29. 29. Do You Make a Million/Billion Decisions? <ul><li>Materials and chemistry just examples: Increased complexity increases appetite for large optimization. </li></ul><ul><li>Modern design increasingly complex. </li></ul><ul><li>The Os all have many decisions to make: </li></ul><ul><ul><li>Nano </li></ul></ul><ul><ul><li>Bio </li></ul></ul><ul><ul><li>Info </li></ul></ul><ul><li>Generally systems increasingly complex: </li></ul><ul><ul><li>~10 5 in a modern automobile </li></ul></ul><ul><ul><li>~10 7 parts in commercial jetliner. </li></ul></ul><ul><li>Will be driven toward routine million/billion variable problems. </li></ul>We get the warhead and then hold the world ransom for... 1 MILLION dollars !
  30. 30. Challenges to Routine Billion-Bit Optimization <ul><li>What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)? </li></ul><ul><li>Need efficiency enhancement: </li></ul><ul><ul><li>Parallel </li></ul></ul><ul><ul><li>Time continuation </li></ul></ul><ul><ul><li>Hybridization </li></ul></ul><ul><ul><li>Evaluation Relaxation </li></ul></ul><ul><li>Take 100-node cluster, and 26% speed increase of other effects: </li></ul><ul><li>Combined effect is multiplicative: 100*1.26*1.26*1.26 ≈ 200 . </li></ul><ul><li>Good, but not good enough. </li></ul>
  31. 31. Data-Mined Evaluation Relaxation as Key <ul><li>Steps: </li></ul><ul><ul><li>Collect evolutionary stream of data : Solution sets and function values. </li></ul></ul><ul><ul><li>Build structural model of key modules & relationships (e.g., hBOA). </li></ul></ul><ul><ul><li>Build fitness surrogate using structural model. </li></ul></ul><ul><ul><li>Substitute surrogate evaluations in place of expensive evaluations. </li></ul></ul><ul><li>Need sample small fraction of expensive evaluations. </li></ul><ul><li>Sounds too good to be true: something for nothing? </li></ul><ul><li>Not really, analog to human cognition. </li></ul>
  32. 32. Supermultiplicative Speedups <ul><li>Synergistic integration of probabilistic models and efficiency enhancement techniques </li></ul><ul><li>Evaluation relaxation </li></ul><ul><li>Learn structural model </li></ul><ul><li>Induce surrogate fitness from structural model </li></ul><ul><li>Estimate coefficients using standard methods. </li></ul><ul><li>Only 1-15% individuals need evaluation </li></ul><ul><li>Speed-Up: 30–53 </li></ul>[ Pelikan & Sastry, 2004 ; Sastry, Pelikan & Goldberg, 2004
  33. 33. Summary <ul><li>What is a genetic algorithm? </li></ul><ul><li>1980s: Unrealized promise of robustness. </li></ul><ul><li>1990s: Increasing competence (scalability) & efficiency. </li></ul><ul><li>2000s: Toward billion-variable optimization. </li></ul><ul><li>Why this matters in practice: Multiscale materials and chemistry modeling as two examples. </li></ul><ul><li>Challenges to routine billion-bit optimization. </li></ul><ul><li>Efficiency enhancement in 4-part harmony. </li></ul><ul><li>Supermultiplicative speedups by mining evolved data & fitness surrogates. </li></ul>
  34. 34. Conclusions <ul><li>Big hardware was a new frontier 20 years ago. </li></ul><ul><li>Big optimization is a frontier today: </li></ul><ul><ul><li>Take extant cluster computing. </li></ul></ul><ul><ul><li>Mix in robustness lessons of the 90s. </li></ul></ul><ul><ul><li>Efficiency enhancement of the 2000s. </li></ul></ul><ul><ul><li>Integrate into problems. </li></ul></ul><ul><li>This technology can create competitive advantage for industries visionary enough to grab hold: </li></ul><ul><ul><li>In the O’s (bio, nano, info) </li></ul></ul><ul><ul><li>Large-scale complex systems. </li></ul></ul><ul><li>Find out more by engaging UIUC and NCSA work, today. </li></ul>
  35. 35. More Information <ul><li>Billion-bit article in Complexity ( http://www3.interscience.wiley.com/cgi-bin/abstract/114068026/ABSTRACT?CRETRY=1&SRETRY=0 ) and tech report ( ftp://ftp-illigal.ge.uiuc.edu/pub/papers/IlliGALs/2007007.pdf ). </li></ul><ul><li>Illinois Genetic Algorithms Laboratory: http://www- illigal.ge.uiuc.edu / . </li></ul><ul><li>The Design of Innovation: Lessons from and for Competent Genetic Algorithms (Kluwer, 2002) . </li></ul><ul><li>Speakers: [email_address] , [email_address] . </li></ul>http://www.amazon.com/exec/obidos/tg/detail/-/1402070985/ref=pd_sl_aw_alx-jeb-9-1_book_5065571_2

×