Genetic Algorithms and Genetic Programming for
  Multiscale Modeling: Applications in Materials
Science and Chemistry and ...
Outline
Background and motivation
Objective and proposed approach
Genetic algorithms (GAs) and Genetic programming (GP)
Ap...
Multiscaling is Ubiquitous in Science & Engineering

  Many phenomena are inherently multiscale
     Physics, chemistry, b...
GAs and GP for Multiscale Materials Modeling

   Need accurate and efficient multiscale methods
       Sparsely sample low...
Genetic Programming

                        f = 0.9
                                    f = 0.95




f = 0.5



         ...
Applications: Multiscale Modeling in Materials
           Science and Chemistry
Accuracy of modeling depends on accurate r...
Recap: Multi-timescale Modeling of Alloy Kinetics
 Molecular dynamics (MD): (~10–9 secs) many realistic
 processes are ina...
Results: (001) Surface Vacancy-assisted Migration

    Total 2nd n.n. Active configurations: 8192

                       ...
Multiscaling Photochemical Reactions




Halorhodopsin                                                         Excited sta...
Reaction Dynamics Over Multiple Timescales

                                     Tune
      Ab Initio
                    ...
Fitness: Errors in Energy and Energy Gradient

Choose few ground- and excited-state
                                      ...
Current Reparameterization Methods Fall Short;
       Need Multiobjective Optimization
Current Method: Staged single-objec...
Multiobjective Optimization

Unlike single-objective problems, multiobjective problems
involve a set of optimal solutions ...
MOGA Finds High Quality SE Parameters


                                                    MOGA vs.
                     ...
MOGA Population Quantifies Parameter Stability

  GA population contains useful data that can be mined
     E.g., 80,000 c...
On-Line Parameter Stability Analysis in MOGA




Analysis of quality of solutions around Pareto-optimal set
yields a good ...
MOGA Results Yield Accurate Engergies

                                                 D2d twisted
                      ...
Check potential energy surface with dynamics



                                            ab initio value: 180±50 fs



...
Semiempirical Parameter Interaction Identification




      Multiple high-quality
      parameter sets




 Symbolic regr...
Advances in Scalability: GP Population Sizing,
  Competent GP, and Scalability of MOGAs

Premium on competent and efficien...
Recap: How do we size populations?

Tradeoff: Expense = # runs x popsize x # generations
Valuable tool for GP-theorists an...
Population Sizing for Building Block Supply in GP
    At least one copy of every competing schema in initial
    populatio...
GP Population Sizing: Decision-Making Model

Make correct decision between competing BBs
  Statistical in nature
  Fitness...
Population Sizing in GP for Good Decision Making



Probabilistic safety factor
                              Signal-to-no...
Scalable GP Design

Design a competent GP
  Solve hard problems quickly, reliably, and accurately
  Identify and exchange ...
Extended Compact Genetic Algorithm (eCGA)

 A Probabilistic model building GA [Harik, 1999]
    Builds models of good solu...
PIPE: Model Representation & Sampling




Univariate model: Independent nodes [Salustowicz & Schmidhuber,
1997]
Start with...
Extended Compact Genetic Programming: eCGP




  Complete n-ary tree (similar to PIPE)
  Marginal product model (similar t...
Scalability of eCGP: GP-easy and GP-hard Problems
     GP-Easy problem                      GP-Hard problem




       [Sa...
Scalability of Multiobjective GAs

   Multiobjective estimation of distribution algorithms (EDAs)
   [Bosman & Thierens, 2...
Usual Scalability Game

Boundedly-difficult problem where linkage learning is critical
Scalability as a function of # buil...
Overwhelming The Nicher is Easy!

Building blocks are accurately identified and sampled
Test problem has exponential # Par...
MOEDAs Outperform Simple MOGAs

Practical population sizes can’t yield all optimal solutions
   Capture representative sol...
Controlled Growth of Competing Building Blocks
       Understand the fundamental limit of growth in # Pareto-
       optim...
Summary and Conclusions
Broadly applicable in chemistry and materials science
  Analogous applicability in modeling multis...
Upcoming SlideShare
Loading in …5
×

Genetic Algorithms and Genetic Programming for Multiscale Modeling

3,745 views

Published on

Effective and efficient multiscale modeling is essential to advance both the science and synthesis in a wide array of fields such as physics, chemistry, materials science, biology, biotechnology and pharmacology. This study investigates the efficacy and potential of using genetic algorithms for multiscale materials modeling and
addresses some of the challenges involved in designing competent algorithms that solve hard problems quickly, reliably
and accurately.

Published in: Business, Technology
1 Comment
2 Likes
Statistics
Notes
  • Fioricet is often prescribed for tension headaches caused by contractions of the muscles in the neck and shoulder area. Buy now from http://www.fioricetsupply.com and make a deal for you.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,745
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
335
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Genetic Algorithms and Genetic Programming for Multiscale Modeling

  1. 1. Genetic Algorithms and Genetic Programming for Multiscale Modeling: Applications in Materials Science and Chemistry and Advances in Scalability Kumara Sastry Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign ksastry@uiuc.edu This work is supported by the AFOSR F49620-00-0163 and FA9550-06-1-0096, NSF DMR-99-76550 and DMR 03-25939 (MCC), DOE DEFG02-91ER45439 (MRL), and CSE fellowship, UIUC.
  2. 2. Outline Background and motivation Objective and proposed approach Genetic algorithms (GAs) and Genetic programming (GP) Applications: Multi-timescale modeling of alloy kinetics Multiscaling quantum-chemistry simulation Advances in scalability: Population sizing for GP Scalable GP design Scalability limits of multiobjective GAs Summary & Conclusions
  3. 3. Multiscaling is Ubiquitous in Science & Engineering Many phenomena are inherently multiscale Physics, chemistry, biology, materials science, biotech, etc., Need accurate and fast modeling methods To advance science and expedite synthesis Accommodate different system sizes and time scales Time (picoseconds-minutes) & Space (Nanometers-Meters) Existing modeling methodologies effective on single scale Electronic structure calculations (Angstroms) Finite-element method (micro- to millimeters) Bridging methods for effective multiscaling is non-trivial Current multiscale methods fall short
  4. 4. GAs and GP for Multiscale Materials Modeling Need accurate and efficient multiscale methods Sparsely sample low-level models Develop custom-made constitutive relationship Use GAs & GP for bridging higher- and lower-level models Robust, competent, and efficient Handle multiple objectives Independent of representation [Sastry et al (2004) IJMCE]
  5. 5. Genetic Programming f = 0.9 f = 0.95 f = 0.5 f = 0.1 f = 0.11 f = 0.01
  6. 6. Applications: Multiscale Modeling in Materials Science and Chemistry Accuracy of modeling depends on accurate representation of potential energy surface (PES) Both values and shape matter Ab initio methods: Accurate, but slow (hours-days) Compute PES from scratch Faster methods: Fast (seconds-minutes), accuracy depends on PES accuracy Need direct/indirect knowledge of PES Known and unknown potential function/method Multiscaling quantum chemistry simulations [Sastry et al (2006) GECCO; Sastry et al (2007) MMP] Multi-timescaling alloy kinetics [Sastry et al (2006) Phys Rev B]
  7. 7. Recap: Multi-timescale Modeling of Alloy Kinetics Molecular dynamics (MD): (~10–9 secs) many realistic processes are inaccessible. Kinetic Monte Carlo (KMC): (~secs) need all diffusion barriers a priori. (God or compute) Real time Table Lookup Symbolically KMC Regressed KMC (sr-KMC) Efficient Coupling of MD and KMC On the fly KMC Use MD to get some diffusion barriers. Use KMC to span time. Complexity Use GP to regress all barriers from some barrier info. Span 10–15 seconds to seconds (15 orders of magnitude)
  8. 8. Results: (001) Surface Vacancy-assisted Migration Total 2nd n.n. Active configurations: 8192 Dramatic scaling over MD (109 at 300 K) 102 decrease in CPU time for calculating barriers 103-107 less CPU time than on-they-fly KMC chosen by the AIP ΔE calculated: ∼3% (256) configurations editors as focused article of frontier research in Virtual Low-energy events: <0.1% prediction error Journal of Nanoscale Overall events: <1% prediction error Science & Technology, 12(9), 2005
  9. 9. Multiscaling Photochemical Reactions Halorhodopsin Excited state Ground state Photochemistry is important for photosynthesis, vision, solar energy, pharmacology,… Need methods with accurate description of excited states Dynamics requires many electronic-energy evaluations Silver “Humies” Medal in Human Competitive Results, Best paper award, Real-world applications track, GECCO-2006 (ACM SIGEVO conference)
  10. 10. Reaction Dynamics Over Multiple Timescales Tune Ab Initio Semiempirical Semiempirical Quantum Chemistry Methods Parameters Methods Accurate but slow (hours-days) Fast (secs.-mins.), accuracy Can calculate excited states depends on parameters. Calculate integrals from fit parameters. Accurate excited-state surfaces with semiempirical methods Permits dynamics of larger systems: proteins, nanotubes, etc., Fitting/Tuning semiempirical potentials is non-trivial Energy & shape of the PES matter Especially around ground and excited states
  11. 11. Fitness: Errors in Energy and Energy Gradient Choose few ground- and excited-state ab initio SE method configurations Fitness #1: Errors in energy For each configuration, compute energy difference via ab initio and semiempirical methods ab initio SE method Fitness #2: Errors in energy gradient For each configuration, compute energy gradient via ab initio and SE methods
  12. 12. Current Reparameterization Methods Fall Short; Need Multiobjective Optimization Current Method: Staged single-objective optimization First minimize error in energies Subsequently minimize weighted error in energy and gradient Multiple objectives and highly multimodal Don’t know the weights of different objectives Local search gets stuck in low-quality optima Multiobjective optimization Simultaneously obtain “Pareto-optimal” solutions. O Avoid potentially irrelevant O O * and unphysical pathways.
  13. 13. Multiobjective Optimization Unlike single-objective problems, multiobjective problems involve a set of optimal solutions often termed as Pareto- optimal solutions. Notion of non-domination [Goldberg, 1989] • A dominates C Solution A dominates C if: • A and B are non-dominant A is no worse than C in all objectives • B is more crowded than A A is better than C in at least one objective Two goals: Converge onto the Pareto-optimal solutions (best non-dominated set ) Maintain as diverse a distribution as possible [Deb, 2002; Coello Coello et al, 2002]
  14. 14. MOGA Finds High Quality SE Parameters MOGA vs. published results Multi- vs. single-objective optimization Each point is a set of 11 parameters Not even one Pareto-optimal solution obtained with multiple weighted single-objective optimizations MOGA results have significantly lower errors than current results. Are all MOGA solutions good from chemists perspective?
  15. 15. MOGA Population Quantifies Parameter Stability GA population contains useful data that can be mined E.g., 80,000 candidate solutions sampled in each GA run. Example: stability/sensitivity of parameter sets RMS deviation of the error in energy and energy gradients of solutions within 1% of the Pareto-optimal solutions Stable parameter set has low RMS deviations. How to set the threshold? Perturbation of 1% around PM3 set reveals RMS deviation in error in energy: 0.99 RMS deviation in error in energy gradient: 0.023
  16. 16. On-Line Parameter Stability Analysis in MOGA Analysis of quality of solutions around Pareto-optimal set yields a good measure of the SE parameter stability. Online analysis is efficient and reliable
  17. 17. MOGA Results Yield Accurate Engergies D2d twisted Pyramidalized E + E(Pyr-CI S1) ∗ E(D2d S1) - E(D2d S0) x E(D2d S1) - E(Pyr-CI S1) MOGA-optimized parameters yield accurate energies for untested, critical configurations. Parameter sets with lower error in energy are preferable
  18. 18. Check potential energy surface with dynamics ab initio value: 180±50 fs Population transfer determined using 50 initial conditions for each parameter set Parameter sets with lower error in energy gradient values have lifetimes close to ab initio value [Quenneville, Ben-Nun & Martinez, 2001]
  19. 19. Semiempirical Parameter Interaction Identification Multiple high-quality parameter sets Symbolic regression of semiempirical parameters via GP Interpretable optimized semiempiricial methods
  20. 20. Advances in Scalability: GP Population Sizing, Competent GP, and Scalability of MOGAs Premium on competent and efficient GAs and GP Need to understand their scalability GA and GP designs that solve hard problems quickly, reliably, and accurately Population Sizing in GP [Sastry, et al (2003) GPTP; Sastry, O’Reilly, & Goldberg (2004) GPTP] Building-block supply and decision-making grounds Competent GP design [Sastry & Goldberg (2003). GPTP] Automatically identify and exchange key building blocks Scalability of competent MOGAs [Sastry, Pelikan, & Goldberg (2005) CEC; Pelikan, Sastry & Goldberg (2006) SOPM] Reliably maintain diverse Pareto-optimal solutions
  21. 21. Recap: How do we size populations? Tradeoff: Expense = # runs x popsize x # generations Valuable tool for GP-theorists and practitioners Insight on GP mechanisms Practical guide to set population size Competency: If it’s not a problem with pop size, it must be something else Scalability: How does population size scale with problem size? Efficiency: How to choose the tradeoff? Limited attention paid in GP theory Correct population sizing is critical to GP success [Sastry, et al (2003) GPTP; Sastry, O’Reilly, & Goldberg (2004) GPTP]
  22. 22. Population Sizing for Building Block Supply in GP At least one copy of every competing schema in initial population: # competing schemas Error tolerance Tree size Alphabet cardinality Expression probability Bigger the primitive set, greater the population size Bigger the trees, smaller the population size Expression also matters [Holland, 1975; Goldberg, 1989; Reeves, 1993; Goldberg, Sastry, & Latoza, 2001; Sastry et al, 2004]
  23. 23. GP Population Sizing: Decision-Making Model Make correct decision between competing BBs Statistical in nature Fitness is measure of the entire chromosome [Goldberg (1989); Goldberg & Rudnick (1992); Goldberg, Deb & Clark (1992); Harik, Cantú-Paz, Goldberg, & Miller (1997)] [Goldberg, Deb, and Clark (1992)]
  24. 24. Population Sizing in GP for Good Decision Making Probabilistic safety factor Signal-to-noise ratio Number of Sub-component sub-components complexity Population size increases Signal-noise-ratio decreases Error tolerance decreases Tree size decreases Alphabet size increases Building block size increases Problem size increases Bloat increases [Sastry, O’Reilly, & Goldberg (2004)]
  25. 25. Scalable GP Design Design a competent GP Solve hard problems quickly, reliably, and accurately Identify and exchange substructures effectively Polynomial scale-up on boundedly difficult problem Combine ideas from GAs & GP Extended compact GA (eCGA) [Harik, 1999] Probabilistic incremental program evolution (PIPE) [Salustowicz & Schmidhuber, 1997] Study scale-up of competent GP GP-easy and GP-hard problems [Sastry & Goldberg (2003). GPTP]
  26. 26. Extended Compact Genetic Algorithm (eCGA) A Probabilistic model building GA [Harik, 1999] Builds models of good solutions as linkage groups Key idea: Good probability distribution ≡ Linkage learning Key components: Representation: Marginal product model (MPM) Marginal distribution of a gene partition Quality: Minimum description length (MDL) Occam’s razor principle All things being equal, simpler models are better Search Method: Greedy heuristic search
  27. 27. PIPE: Model Representation & Sampling Univariate model: Independent nodes [Salustowicz & Schmidhuber, 1997] Start with root node, depth first, left-to-right traversal Choose either a function or terminal based on model
  28. 28. Extended Compact Genetic Programming: eCGP Complete n-ary tree (similar to PIPE) Marginal product model (similar to eCGA) Partition tree-nodes into clusters Marginal probability distribution of each cluster
  29. 29. Scalability of eCGP: GP-easy and GP-hard Problems GP-Easy problem GP-Hard problem [Sastry, & Goldberg, 2003] eCGP scales as O(λk3) Advantageous when linkage-learning is critical Do such problems exist in GP domain? Broader issue of problem difficulty in GP Optimization vs. System identification
  30. 30. Scalability of Multiobjective GAs Multiobjective estimation of distribution algorithms (EDAs) [Bosman & Thierens, 2002, Khan et al, 2002, Ocenasek, 2002, Ahn, 2005] Model-building and model-sampling of EDAs Selection and replacement of multiobjective GAs Outperform MOGAs on boundedly-difficult additively and hierarchically decomposable problems Limited scalability analysis Demonstrated scalability is a strong point of EDAs How do MOEDAs scale with problem size on additively separable boundedly-difficult problems? [Sastry, Pelikan, & Goldberg (2005) CEC; Pelikan, Sastry & Goldberg (2006) SOPM]
  31. 31. Usual Scalability Game Boundedly-difficult problem where linkage learning is critical Scalability as a function of # building blocks, m Use similar approach for MOEDAs
  32. 32. Overwhelming The Nicher is Easy! Building blocks are accurately identified and sampled Test problem has exponential # Pareto-optimal solutions Optimal solutions composed by 0000 and 1111 blocks 2m solutions in the Pareto-optimal set m+1 distinct points in the Pareto-optimal set OneMax-ZeroMax At least one copy of all 2m solutions O(2m)
  33. 33. MOEDAs Outperform Simple MOGAs Practical population sizes can’t yield all optimal solutions Capture representative solutions on the Pareto-optimal front EDAs outperform simple MOEAs [Pelikan, Sastry, Goldberg (2005)] [Pelikan, Sastry, Goldberg (2005)]
  34. 34. Controlled Growth of Competing Building Blocks Understand the fundamental limit of growth in # Pareto- optimal solutions that can yield polynomial scale-up Obj #1 Trap Trap Trap Trap Trap # competing BBs = 2 Obj #2 Trap Trap Inv Trap Trap Inv Trap eCGA pop. size: O(m2) Niching pop. size: Controlled Ensuring polynomial growth of competing building blocks scalability: [Sastry, Pelikan, & Goldberg, 2005]
  35. 35. Summary and Conclusions Broadly applicable in chemistry and materials science Analogous applicability in modeling multiscaling phenomena. Facilitates fast and accurate materials modeling Alloys: Kinetics simulations with ab initio accuracy. 102-1015 increase in simulation time. 104-107 times faster than current methods Chemistry: Reaction-dynamics with ab initio accuracy. 102-105 increase in simulation time 10-103 times faster than current methods. Lead potentially to new drugs, new materials, fundamental understanding of complex chemical phenomena Science: Biophysical basis of vision, and photosynthesis Synthesis: Pharmaceuticals, functional materials

×