Your SlideShare is downloading. ×
A Billion Bits or Bust
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

A Billion Bits or Bust

10,156
views

Published on

Describes motivation, research, and results at University of Illinois to solve a billion-bit genetic algorithm with subquadratic scalability.

Describes motivation, research, and results at University of Illinois to solve a billion-bit genetic algorithm with subquadratic scalability.

Published in: Business, Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,156
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Billion Bits or Bust David E. Goldberg 1 , Kumara Sastry 1,2 , Xavier Llor à 1, 3 1 Illinois Genetic Algorithms Laboratory (IlliGAL) 2 Materials Computation Center (MCC) 3 National Center for Super Computing Applications (NCSA) University of Illinois at Urbana-Champaign Urbana, IL 61801 USA [email_address] , [email_address] , [email_address] http://www- illigal.ge.uiuc.edu Supported by AFOSR FA9550-06-1-0096 and NSF DMR 03-25939. Computational results were obtained using CSE’s Turing cluster.
  • 2. Billion-Bit Optimization?
    • Strides w/ genetic algorithm (GA) theory/practice in 1990s.
      • Solving large, hard problems in principled way.
      • Moving to practice in important problem domains.
    • Still GA boobirds claim (1) no theory, (2) too slow, and (3) just voodoo.
    • How demonstrate results achieved so far in dramatic way? Billion bits or bust whitepaper (10/23/05) .
    • DEG lunch questions: A million? Sure. A billion? Maybe.
    • Naïve GA implementation for a billion bits:
      • ~100 terabytes memory for population storage.
      • ~2 72 random number calls.
    • Naïve approach goes nowhere.
  • 3. Roadmap
    • What is a genetic algorithm?
    • 1980s: Trouble in River City.
    • 1990s: IlliGAL solves robustness problems.
    • 2000s: Toward billion-variable optimization
    • Why this matters in practice: Example, multiscale materials and chemistry modeling
    • Challenges to using this in industry.
    • Efficiency enhancement in 4-part harmony.
    • Supermultiplicative speedups by mining evolved data.
  • 4. What is a Genetic Algorithm (GA)?
    • Search procedure based on mechanics of natural selection and genetics .
    • Representation: GAs operate on codes:
      • Binary or Gray
      • Permutation
      • Integer or real
      • Program
    • Fitness function: Objective function, subjective function, ecological co-evolution
    • Population: Candidate solutions (individuals)
    • Genetic operators:
      • Selection: “Survival of the fittest”
      • Recombination: Combine parental traits to create offspring
      • Mutation: Modify an offspring slightly
  • 5. 1980s: Trouble in River City
    • 1980s: Promise of robustness not realized.
    • First generation performance uncertain:
      • Sometimes effective, sometimes not.
      • Sometimes fast, sometimes slow.
    • How can we make GAs scalable? Competence.
    • How can we make them practical? Efficiency
  • 6. 1990s: Competent and Efficient GAs
    • 90s: IlliGAL solves robustness puzzle.
    • Competence: Solve hard problems quickly, reliably, and accurately ( Intractable to tractable ).
    • Efficiency: Develop speedup procedures ( tractability to practicality ).
    • Principled design : [ Goldberg, 2002 ]
      • Relax rigor, emphasize scalability/quality.
      • Use problem decomposition.
      • Use facetwise models, and patchquilt integration using dimensional analysis.
  • 7. Competent GAs Then: 1993, GA Kitty Hawk
    • Early 90s the facetwise theory came together.
    • Early 90s first competent GAs appeared.
    • First competent GA in 1993: fast messy GA (fmGA).
    • Original mGA complexity estimated: O( l 5 )
    • fmGA subquadratic on deceptive problem.
    [Goldberg, Deb, Kargupta, & Harik, 1993]
  • 8. Competent GAs Now: hBOA
    • Replace genetics with probabilistic model building: PMBGA or EDA.
    • 3 main elements:
      • Decomposition (structural learning)
        • Learn what to mix and what to keep intact
      • Representation of BBs (chunking)
        • Means of representing alternative solutions
      • Diversification of BBs (niching)
        • Preserve alternative chunks of solutions
    • Combines selectionist approach of GAs, statistical methods, and machine learning.
    [US utility patent # 7,047,169]
  • 9. Hierarchical BOA (hBOA) Facetwise model Current population Selection New population Bayesian network
  • 10. hBOA on Hard Antenna Designs [Yu, Santarelli, & Goldberg, 2006]
  • 11. Aiming for a Billion
    • Theory & algorithms in place.
    • Needed the guts to try.
    • Focus on key theory, implementation, & efficiency enhancements.
    • Theory keys :
      • Problem difficulty.
      • Parallelism.
    • Implementation key : compact GA.
    • Efficiency keys :
      • Various speedup.
      • Memory savings.
    • Results on a billion-variable noisy OneMax.
  • 12. Theory Key 1: Master-Slave  Linear Speedup
    • Speed-up:
    • Max speed-up at
    [Cantu-Paz & Goldberg, 1997; Cantú -Paz, 2000 ]
    • Near linear speed-up until
  • 13. Theory Key 2: Noise Covers Most Problems
    • Adversarial problem design [Goldberg, 2002]
    • Blind noisy OneMax
    P Fluctuating Deception Noise Scaling R
  • 14. Implementation Key: Compact GA
    • Simplest probabilistic model building GA [Harik, Lobo & Goldberg, 1997; Baluja, 1994; M ü hlenbein & Paa ß , 1996]
    • Represent population by probability vector
      • Probability that i th bit is 1
    • Replace recombination with probabilistic sampling
    • Selectionist scheme
    • New population evolution through probability updates
    • Equivalent to GA with steady-state tournament selection and uniform crossover
  • 15. Compact Genetic Algorithm (cGA)
    • Random initialization: Set probabilities to 0.5
    • Model Sampling: Generate two candidate solutions by sampling the probability vector
    • Evaluation: Evaluate the fitness of two sampled solutions
    • Selection: Select the best among the sampled solutions
    • Probabilistic model update: Increase the proportion of winning alleles by 1/n
  • 16. Parallel cGA Architecture Processor #n p Sample bits 1-  /n p Select best individual Update probabilities Collect partial sampled solutions and combine Parallel fitness evaluation of sampled solutions Broadcast fitness values of sampled solutions Processor #1 Sample bits 1-  /n p Select best individual Update probabilities Processor #2 Sample bits 1-  /n p Select best individual Update probabilities
  • 17. cGA is Memory Efficient: Θ (  ) vs. Θ (  1.5 )
    • Orders of magnitude memory savings via efficient GA
    • Example: ~32 MB per processor on a modest 128 processors for billion-bit optimization
    • Simple GA:
    • Compact GA:
      • Frequencies instead of probabilities (4 bytes)
      • Parallelization reduces memory per processor by factor of n p
  • 18. Vectorization Yields Speedup of 4
    • SIMD instruction set allows vector operations on 128-bit registers
    • Equivalent to 4 processors per processor
    • Vectorize costly code segments with AltiVec/SSE2
      • Generate 4 random numbers at a time
      • Sample 4 bits at a time
      • Update 4 probabilities at a time
  • 19. Other Efficiencies Yield Speedup of 15
    • Bitwise operations
    • Limited floating-point operations
    • Inline functions
    • Avoid using mod and division operations
    • Precomputing bit sums and indexing
    • Parallel, vectorized, and efficient GA:
      • Memory scales as  (  / n p ); Speedup scales as 60 n p
      • ~32 MB memory, and ~10 4 speedup with 128 processors
  • 20. GA Population Sizing
    • Additive Gaussian noise with variance  2 N
    • Population sizing scales: O( m 0.5 log m ) – O( m log m )
    [Harik, et al, 1997] Noise-to-fitness variance ratio Error tolerance Signal-to-Noise ratio # Competing sub-components # Components (# BBs)
  • 21. GA Convergence Time
    • Convergence time scales: O( m 0.5 ) – O( m )
    • GA scales as: O( m log m ) – O( m 2 log m )
    [Miller & Goldberg, 1995; Goldberg, 2002; Sastry & Goldberg, 2002] Selection Intensity Problem size ( m · k )
  • 22. GA Solves Billion-Variable Optimization Problem
    • Solved 33 million (2 25 ) bit problem to optimality.
    • Solved 1.1 billion (2 30 ) bit problem with relaxed, but guaranteed convergence
    GA scales  (  ¢ log  ¢ (1+  2 N /  2 f ))
  • 23. Do Problems Like This Matter?
    • Yes, for three reasons:
      • Many GAs no more sophisticated than cGA.
      • Inclusion of noise was important because it covers all difficulty (except deception).
      • Know how to handle deception and other problems through PMBGAs like hBOA.
    • Have experience solving tough problems with ordinary genetic and evolutionary algorithms:
      • Material science.
      • Chemistry.
    • Complex versions of these kinds of problems need billion-bit optimization.
  • 24. Multiscale Nanomaterials Modeling
    • Accuracy of modeling depends on accurate representation of potential energy surface (PES)
      • Both values and shape matter
    • Ab initio methods:
      • Accurate, but slow (hours-days)
      • Compute PES from scratch
    • Faster methods:
      • Fast (seconds-minutes), accuracy depends on PES accuracy
      • Need direct/indirect knowledge of PES
    • Known and unknown potential function/method
      • Multiscaling quantum chemistry simulations [ Sastry et al, 2006 ]
      • Multi-timescaling alloy kinetics [ Sastry et al, 2004 ; Sastry et al, 2005 ]
  • 25.
    • Molecular dynamics (MD): (~10 –9 secs) many realistic processes are inaccessible.
    • Kinetic Monte Carlo (KMC): (~secs) need all diffusion barriers a priori . ( God or compute )
    • Efficient Coupling of MD and KMC
      • Use MD to get some diffusion barriers .
      • Use KMC to span time .
      • Use GP to regress all barriers from some barrier info.
    • Span 10 –15 seconds to seconds ( 15 orders of magnitude )
    Multi-timescale Modeling of Alloys
    • chosen by the AIP editors as focused article of frontier research in Virtual Journal of Nanoscale Science & Technology , 12(9), 2005
    Real time Complexity Table Lookup KMC On the fly KMC Symbolically Regressed KMC (sr-KMC)
  • 26.
    •  E calculated: » 3% (256) configurations
    • Low-energy events: <0.1% prediction error
    • Overall events: <1% prediction error
    GP-Enhanced Kinetics Modeling
    • Total 2 nd n.n. Active configurations: 8192
    • Dramatic scaling over MD (10 9 at 300 K)
    • 10 2 decrease in CPU time for calculating barriers
    • 10 3 -10 6 less CPU time than on-they-fly KMC
  • 27. Chemistry: GA-Enhanced Reaction Dynamics
    • Accurate excited-state surfaces with semiempirical methods
      • Permits dynamics of larger systems: proteins, nanotubes, etc.,
    • Energy & shape of the PES matter
    • Uknown PES functional form: Multi-timescaling alloy kinetics [ Sastry et al, 2004 ; Sastry et al, 2005 ]
    Accurate but slow (hours-days) Can calculate excited states Fast (secs.-mins.), accuracy depends on parameters. Calculate integrals from fit parameters. Ab Initio Quantum Chemistry Methods Semiempirical Methods Tune Semiempirical Parameters [ Best paper and Silver “Humies” award . GECCO (ACM SIG conference) ]
  • 28. MOGA Finds Physical and Accurate PES
    • MOGA results have significantly lower errors than current results.
    • Globally accurate PES yielding physical reaction dynamics
    Each point is a set of 11 parameters
    • 10 2 -10 5 increase in simulation time
    • 10-10 3 times faster than current methods
    • Produces transferable SE parameters
    • Interpretable semiempirical methods
  • 29. Do You Make a Million/Billion Decisions?
    • Materials and chemistry just examples: Increased complexity increases appetite for large optimization.
    • Modern design increasingly complex.
    • The Os all have many decisions to make:
      • Nano
      • Bio
      • Info
    • Generally systems increasingly complex:
      • ~10 5 in a modern automobile
      • ~10 7 parts in commercial jetliner.
    • Will be driven toward routine million/billion variable problems.
    We get the warhead and then hold the world ransom for... 1 MILLION dollars !
  • 30. Challenges to Routine Billion-Bit Optimization
    • What if you have large nonlinear solver (PDE, ODE, FEM, KMC, MD, whatever)?
    • Need efficiency enhancement:
      • Parallel
      • Time continuation
      • Hybridization
      • Evaluation Relaxation
    • Take 100-node cluster, and 26% speed increase of other effects:
    • Combined effect is multiplicative: 100*1.26*1.26*1.26 ≈ 200 .
    • Good, but not good enough.
  • 31. Data-Mined Evaluation Relaxation as Key
    • Steps:
      • Collect evolutionary stream of data : Solution sets and function values.
      • Build structural model of key modules & relationships (e.g., hBOA).
      • Build fitness surrogate using structural model.
      • Substitute surrogate evaluations in place of expensive evaluations.
    • Need sample small fraction of expensive evaluations.
    • Sounds too good to be true: something for nothing?
    • Not really, analog to human cognition.
  • 32. Supermultiplicative Speedups
    • Synergistic integration of probabilistic models and efficiency enhancement techniques
    • Evaluation relaxation
    • Learn structural model
    • Induce surrogate fitness from structural model
    • Estimate coefficients using standard methods.
    • Only 1-15% individuals need evaluation
    • Speed-Up: 30–53
    [ Pelikan & Sastry, 2004 ; Sastry, Pelikan & Goldberg, 2004
  • 33. Summary
    • What is a genetic algorithm?
    • 1980s: Unrealized promise of robustness.
    • 1990s: Increasing competence (scalability) & efficiency.
    • 2000s: Toward billion-variable optimization.
    • Why this matters in practice: Multiscale materials and chemistry modeling as two examples.
    • Challenges to routine billion-bit optimization.
    • Efficiency enhancement in 4-part harmony.
    • Supermultiplicative speedups by mining evolved data & fitness surrogates.
  • 34. Conclusions
    • Big hardware was a new frontier 20 years ago.
    • Big optimization is a frontier today:
      • Take extant cluster computing.
      • Mix in robustness lessons of the 90s.
      • Efficiency enhancement of the 2000s.
      • Integrate into problems.
    • This technology can create competitive advantage for industries visionary enough to grab hold:
      • In the O’s (bio, nano, info)
      • Large-scale complex systems.
    • Find out more by engaging UIUC and NCSA work, today.
  • 35. More Information
    • Billion-bit article in Complexity ( http://www3.interscience.wiley.com/cgi-bin/abstract/114068026/ABSTRACT?CRETRY=1&SRETRY=0 ) and tech report ( ftp://ftp-illigal.ge.uiuc.edu/pub/papers/IlliGALs/2007007.pdf ).
    • Illinois Genetic Algorithms Laboratory: http://www- illigal.ge.uiuc.edu / .
    • The Design of Innovation: Lessons from and for Competent Genetic Algorithms (Kluwer, 2002) .
    • Speakers: [email_address] , [email_address] .
    http://www.amazon.com/exec/obidos/tg/detail/-/1402070985/ref=pd_sl_aw_alx-jeb-9-1_book_5065571_2