SlideShare a Scribd company logo
1 of 40
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Multi-Timescale Modeling and Quantum Chemistry
using Machine-Learning Methods via Genetic Programs
MCC Internal Review
University of Illinois
7 November 2005
Faculty:
Duane D. Johnson (MSE), Pascal Bellon (MSE), David Goldberg (GE),
Todd Martinez (Chemistry)
Students:
Kumara Sastry (MSE/GE), Alexis Thompson (Chemistry), Jia Ye (MSE)
See also Poster
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Background on Multi-Time-Scale Modeling
 Growing interest in multi-timescale modeling
 Restrictive and do not yield required speed-up
 Hyperdynamics, Parallel replica (Voter, 1997,1998)
 Focus on infrequent events
 Use transition state theory
 MD + KMC (Jacobsen, Cooper, & Sethna, 1998)
 Elemental metals, tabulated activation barriers
 Hybridize MD & KMC using genetic programming
 Calculate some activation barriers using MD
 Predict others using GP
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Objective for Multi-Timescale Kinetic Modeling
 Can we simulate experimental time scales for dynamics (up to
seconds) for designing nanostuctured functional materials?
–Time of realistic processes requires atomic-scale information, need frequent
events (pico-secs) to rare events (secs).
–Infeasible to compute, e.g., barriers a priori or “on-the-fly”.
– Possible configurations become potentially innumerable.
– relative barrier heights control access and diffusion.
 Propose a novel, effective & practical method based on
Genetic Programming (GP) for the intelligent machine
learning of vast number of barrier values.
 Offer Proof-of-Concept for long-time atomic diffusion in alloy
– a hybrid of MD: nanoseconds 10–9
secs and (Kinetic MC): seconds.
–Use MD to get some diffusion barriers.
–Use KMC to span 15 orders of time!, but need all barriers.
–Use GP to regress all barriers from some barrier info.
–Savings compared to Table Look-up is 4-8+ orders of magnitude.
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Time evolution of realistic processes requires atomic-scale data, but scales
inaccessible via atomic simulations (only nano-seconds!).
Ex: Vacancy-assisted Migration at Cu50Co50 (001) Surface using Kinetic Monte Carlo
• Simulate seconds from KMC but need all barriers! (2nd n.n.: 8192 barriers!)
• Machine-learn barriers as regressed in-line function E(c0,x) from few barriers .
• GP needs < 3% of the barriers for < 0.1% error!
• CPU savings (compared to Database look-up) is 4-8+ orders of magnitude.
RESULT: Time Multiscaling to Seconds using Genetic Programming
n.n. jumps:
1st
2nd
database
K. Sastry, DD Johnson, DE Golderg, P. Bellon, Phys. Rev. B 72, 085438 (2005).
Chosen by AIP editors as frontier research for the Virtual J. of Nanoscience (Aug, 2005)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Objective for Using Multi-Objective Genetic
Algorithms in Quantum Chemistry
 Can we utilize concepts from multi-objective optimization
theory (Pareto fronts) to create ab initio accurate
semiemprical quantum chemistry potentials to dramatic
speed-up searches over reaction pathways?
 Offer Proof-of-Concept for Benzene and Ethylene.
 Future: Propose using Genetic Programming (GP) to
machine-learn semiempirical potential form to improve
reliability and speed.
GA-MNDO-PM3
CASPT2
RESULT: S1/S2 Conical Intersection
from Machine-Learned MNDO vs. ab initio CASSCF
Reaction Coordinate
0.0 0.2 0.4 0.6 0.8 1.0
Relative Energy(ev)
0
2
4
6
8
10 FC → S2/S1 Intersection
Dashed lines = target CASPT2 results
only values/gradients at x=0 included in GA fitting
Minimal Energy Intersections – Expected to play a prominent role in
excited-state chemistry (nonadiabatic transitions).
Red and Blue are g/h vectors – displacements which lift electronic degeneracy.
Benzene
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Genetic Programming Optimization for
Multi-Timescale Modeling
Kinetic Diffusion.
Cu-Co Vacancy-Assisted Surface Migration
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
IlliGAL at University of Illinois Urbana-Champaign
http://www-illigal.ge.uiuc.edu/
Studying nature's search algorithm of choice, genetics and evolution, as a
practical approach to solving difficult problems on a computer.
The mechanics of a genetic algorithm (GA) are conceptually simple:
(1) maintain a population of solutions coded as chromosomes,
(2) select the better solutions for recombination (crossover) of mating
chromosomes.
(3) perform mutation and other variation operators on the chromosomes, and
(4) use these offspring to replace poorer solutions or to create a new generation.
Theory and empirical results demonstrate that GAs lead to improved
solutions in many problem domains, and well-designed GAs can be
guaranteed to solve a broad class of provably hard problems, quickly,
reliably, and accurately.
David Goldberg (General Engineering)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
If gradient is numerically precise and fast, use gradient algorithm.
If there is an exact ground state, do not use genetic algorithm.
GA Advantages
• no need for knowledge or gradient info about object or energy surface.
• discontinuities present on surface have little effect on optimization.
• resistant to becoming trapped in local minima.
• work well on large-scale optimization problems.
• can be used on a wide variety of problems.
GA Disadvantages
• trouble finding exact global minimum.
• require a large number of cost function evaluations.
• Starting/setting up configurations is not straightforward.
• GAs require more evaluations to move uphill.
Generally, using GAs depends on Problem
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Recall Concept of Genetic Algorithms
 Search based on principles of natural selection and genetics
 Gene encode solution: e.g., binary xn =0 or 1 for a variable
where possible solutions are “gene” sequence {x1,…xN}.
 Fitness (Objective) function: Quality measure of sequence.
 Need known function to minimize (or maximize) to evaluate quality of
gene sequence, e.g., min. f(x) = | “cost” + “constraints”|, or max. f–1
(x).
 Population: A set of candidate gene sequences (solutions).
 Genetic operators:
 Selection: “Survival of the fittest”
 Recombination: Combine parental traits to create offspring
 Mutation: Modify an offspring slightly (local gene sequence change)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Example Use of Gas for Regression
Ian Walmsley and Herschel Rabitz, “Quantum Physics Under Control,” Physics Today, August 2003.
Example: Controlling laser shape pulses to increase yields in molecular reactions.
• Controller is based on a GA to shape (phase
and amplitude) pulses.
• GA evolves trying to maximize mass
spectrometer signal of desired molecular
species.
• thousands of pulse updated per second.
R.J. Levis and H. Rabitz, J. Phys. Chem. A 106, 6427 (2002).
Ψ ~ ckeikr−ωt∑
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Same Concept for Genetic Programming
 Genetic Algorithms that evolve computer programs (Koza, 1992)
 Representation: Programs are represented by trees
 Functions: Internal nodes (eg., {+, –, *, sin, cosh, log})
 Terminals: Leaf nodes (eg., {x, y, 2.3, R})
 R=random ephemeral constant.
 Fitness function: Quality measure of the program
 Population: Candidate programs (individuals)
 Genetic operators:
 Selection: “Survival of the fittest”
 Recombination: Combine parental traits to create offspring
 Mutation: Modify an offspring slightly
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Example GP Function and Tree
 Define Functions: { x, y, z, sin, +, *, ^, ADF1, …}
 ADF “automatically defined functions” can be learned.
 Function Example: Add
Double f_add (arg1, arg2)
{ return arg1 + arg2;
}
 Example TREE:
where x,y,z ∈U=[a,b]
Internal node=fct
Leaf node
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Genetic Programming (GP)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
 Efficient Coupling of MD and KMC
 Multi-timescale modeling of alloys
 Predict entire potential energy (PE) surface
using a few exact PE calculations
 PE surface =
 Don’t know the functional from of f !
e.g., not simple basis, could be product of plane-wave & polynomial
 Use symbolic functional regression via GP
 Search for the regression function, f
 Optimize the values of coefficients, co
 Form of intelligent machine learning (like Bayesian, neural nets, etc.)
Approach for Surface Diffusion
Coefficients
Total alloy configuration
http://gold.cchem.berkeley.edu:8080/research
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
GP for Predicting PE Surface
 Encode: Alloy configuration ⇒ unique number
 Decision variables:
 Function set: {+, -, *, /, exp, sin, ^}
 Terminal set:
 Binary A-B alloy: xi = {0,1}: xi is (is not) A for neighbors (e.g.,
1st
and 2nd
) occupation for vacancy/atom pair costing ∆E.
 Atomistic: Compute ∆E for some subset of configurations
 Object Function: Minimize absolute error in prediction
Energy barrier predicted by GP
Weights (More importance to lower energy barriers)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
 Molecular Dynamics: F = ma
 Real dynamics but for short times (~10–9
secs).
 Many realistic processes are inaccessible.
 Must run long. But can calculate anything.
 Monte Carlo: Sampling, “dumb and blind” method.
 Acceptance probability pi,f = min[1, exp(–∆Ei,f/kBT)]
 Need to calculation each ∆Ei,f= barrier height to change states.
 Time-evolution: not real time (Monte Carlo step, MCS), unless MD or
experiment has provide relation of MCS to real time for all events.
 Kinetic Monte Carlo: assumes Poisson process, know all ∆Ei,f.
 Hence frequency of events (rate) relative to most frequent (known) event
with smallest time span ∆tshortest, e.g. MD or experiment.
 KMC (~secs) but need all jump frequencies a priori.
 KMC steps in REAL TIME and ALL EVENTS accepted!
Quick Overview of the MD, MC, KMC Methods
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Generic Time Enhancements of Table KMC
 GP-predicted PES facilitates use of kinetic Monte Carlo
 Real time in KMC (Fichthorn & Weinberg, 1991)
 Speed-up over MD:
 ~109
at 300 K
 ~105
at 550 K
 ~103
at 900 K
 Less CPU time over MD
–
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Potential Energy Surface: Vacancy-assisted Migration
for fcc Elemental and Binary Alloy
 Computational Method
 Potentials: – Empirical: Morse – Quantum: Tight-binding
 System size: 5 layers, >>100 atoms/layer
 Consider 1st
and 2nd
nearest-neighbor (n.n.) jumps
 Local (active) configs.: 1st
and 2nd
n.n. environments
 Consider rigid (fully relaxed) atoms to calculate ∆E
 Energy Test: pure Cu, n.n. jumps only
 Morse potential: ∆E = 0.39 eV (present work)
 ab initio : ∆E=0.42±0.08 eV
 EAM : ∆E=0.47±0.05 eV
 TB : ∆E=0.45±0.05 eV
 Complex case: Segregating fcc CuxCo1-x
x
y
n.n. jumps:
1st
2nd
x
z
Fixed layers
Co Cu Vacancy
1st
2nd
n.n. configs.:
}Boisvert & Lewis,
Phys. Rev. B. 56 (1997)
(Present work)
1st
n.n. 2nd
n.n.
1st
n.n. configs. 128 128
2nd
n.n. configs. 2048 8192
Total configs. >>2100
>>2100
JumpsActive
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
GP Optimized Regression for PE Surface:
(001) Vacancy-assisted Migration
The Machine-Learned
In-Line Barrier Fct.
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
GP Optimized Regression for PE Surface:
(001) Vacancy-assisted Migration
While Non-Linear Function is complicated, you do not care – give accurate barriers,
otherwise it has no meaning!
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
PES Predictions: (001) Vacancy-assisted Migration
 1st
n.n. active configuration (128 total) for simplicity
 Atoms are either rigid or fully relaxed.
 Simple regression fails: Quadratic (Cubic) Polynomial
Regression needs 27% (78%) of the configurations
 GP needs PE calculation for only 20 configurations (or 16%)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
 Total 2nd
n.n. active
configurations: 8192
 GP needs (∆E calculated):
< 3% (256) configurations
 Low energy migrations:
< 0.1% prediction error
 Overall events: < 1% error
PES Predictions: (001) Vacancy-assisted Migration
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
 From a few (as needed) exact calculations use symbolic functional regression
via GP to predict the entire PE surface from an in-line function f(c0,x).
 Search for the regression function, f , optimize the values of coefficients, co
 Form of intelligent machine learning (like Bayesian, neural nets, etc.)
Symbolically-Regressed KMC (sr-KMC)
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Time Enhancements from sr-KMC
over standard Table KMC
 GP needs (∆E calculated):
– <3% (256) configurations (33 times fewer barrier).
– Using Cluster-expansion techniques 0.3% (330 times fewer)!
 Low energy migrations: < 0.1% prediction error.
 GP yields in-line barrier function: ~100 x faster than table look-up.
 Compared to “on-the-fly” calculations, sr-KMC is 104
-107
faster!
–in-linefunctioncall~10–3
secs per barrier.
– Empirical potential ~10 secs per barrier.
– Tight-binding potential ~1800 secs per barrier.
– first-principles potential even greater.
 How does gain scale with complexity?
– For present problem, the number of barriers required decreases
with complexity of configuration space. PROMISING!
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Multi-Objective GA Optimization of
Semi-Empirical Quantum Chemistry Potentials
with ab initio accuracy.
BENZENE
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Benzene Reparameterization
 Target data: Ab Initio values via CASSCF(6/6) (Toniolo et al 2004)
 ground (S0) and excited-state energies (S1, optical dark, and S2, allowed)
(planar benzene, Dewar Benzene, benzvalene, prefluvene).
 excited-state gradients at these points.(Franck-Condon region)
 Semi-empirical potential: MNDO-PM3 (Stewart,1989)
 11 Carbon parameters to optimize (fix hydrogen or core-core repulsion)
Uss,Upp (Coulombic); Gss, Gsp, Gpp, Gpp` (repulsion); Hsp (exchange); βs, βp (resonance); ζs, ζp (Slater
orbital exponents) -Modified Neglect of Differential Overlap
 Objectives: #1: Error in excited-state energies and geometries
#2: Error in excited-state gradients
Dewar
(b)
1.335
[1.348]
(1.347)
1.481
[1.523]
(1.530)
116.1º
[116.8º]
(116.5º)
1.606
[1.601]
(1.587)
Prefulvene
(c)
1.371
[1.401]
(1.404)
1.458
[1.496]
(1.501)
1.533
[1.524]
(1.546)
177.7º
[136.0º]
(136.3º)
S1-S0 CI
(d)
1.422
[1.466]
(1.462)
1.367
[1.395]
(1.397)
170.2º
[131.4º]
[129.7º]
1.857
[1.942]
(1.945)
(1.387)
(1.476)
Benzvalene
(a)
1.340
[1.335]
(1.350)
1.469
[1.509]
(1.512)
1.430
[1.481]
(1.457)
1.529
[1.498]
(1.526)
1.407
[1.475]
(1.441)
1.359
[1.372]
(1.408)
S2-S1 CI
(e)
2.395
[2.454]
(2.456)
1.384
[1.461]
(1.458)
Figure 1. Ground state optimized geometries and important minimal energy conical intersections for benzene. Bond lengths (Å) and
angles (degrees) from RPM3, CASSCF(6/6)/6-31* (in parentheses), and CASSCF(6/6)*PT2/6-31* optimizations (in brackets) are shown.
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
•Semi-empirical potential: MNDO-PM3 (Stewart,1989)
–11 Carbon parameters to optimize
•Objectives: #1: Error in excited-state energies and geometries
#2: Error in excited-state gradients
• Not obvious how to weight accuracy in energy compared to
accuracy in gradients.
• Multi-objective GA with bias solves the problem!
• Use non-dominated sorting GA II (NSGA-II) (Deb et al., 1999)
–Competence and efficiency can be further enhanced by data mining important
problem substructures (building blocks)
Benzene Reparameterization
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Reparameterization of Semi-Empirical Potentials:
Multiobjective Optimization Approach
*O O
O
 Simultaneously obtain set of non-
dominated (Pareto optimal)
solutions in parallel.
 Avoid potentially irrelevant and
unphysical pathways, arising from
SE-forms.
 Reparameterization of SE-forms involves multiple objectives fit of
limited set of ab initio energies, geometries, and energy gradients.
 Previous Approach: Sequential weighted local optimization
 Yields unphysical potentials, Results in local optima, Depends entirely on
the weights on different objectives
Multiobjective optimization
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Genetic Algorithm Multiobjective Optimization
 Unlike single-objective problems, multi-objective problems
involve a set of Pareto-optimal solutions.
 Notion of Non-Dominating Solutions
• A dominates C.
• A and B are non-dominant.
 Solution X dominates Y if:
 X is no worse than Y in all objectives
 X is strictly better than Y in at least
one objective
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Why use Multiobjective Genetic Algorithms?
 Robust search algorithms that yield good quality solutions
quickly, reliably, and accurately
 Rapidly converge to the Pareto optimal front
 Maintain as diverse a distribution of solutions as
possible
 Population approach suits well to find multiple solutions
 Niche-preservation methods can be exploited to find
diverse solutions
 Implicit parallelism helps provide a parallel search
 Multiple applications of classical methods do not
constitute a parallel search
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Multiobjective GA Results: Unbiased vs. Biased
 Bias = Weight error in energy 2x more than in energy gradient.
 (Un)Biased solutions are consistently better than the published result.
Tonilo et al (2004)
 Pareto-optimal solutions
are physical!
 37% lower gradient error
 33% lower energy error
 Biasing:
- convergence 2-3 times faster
- improves solution quality
- finds physical solutions.
GA Biased
CASPT2
S1/S2 Conical Intersection
Reaction Coordinate
0.0 0.2 0.4 0.6 0.8 1.0
Relative Energy(ev)
0
2
4
6
8
10 FC → S2/S1 Intersection
Dashed lines = target CASPT2 results (only
values and gradients at x=0 included in GA fitting)
Minimal Energy Intersections – Expected to play a prominent role in
excited state chemistry (nonadiabatic transitions).
Red and Blue are g/h vectors – displacements which lift electronic degeneracy.
S1/S2 Branching Plane
CASPT2 GA Biased
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Potentially creates “transferable” potentials
Benzene parameters compares to MO-GA for Ethelyne C2H4
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Mathematical Analysis of GP
 Population size
 Very important parameter for GP performance
 Currently no guidance to choose population size
 Building-Block Supply Analysis
 What population-size should be used?
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
For Ethelyne ~800 solutions needed for reliable
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Summary
 Symbolic regression via GP holds promise for application
to numerous areas of science and engineering.
 GP mathematical analysis required to determine adequate population sizes, etc.
 Case: Surface-vacancy assisted migration in CuxCo1-x
 Dramatic scaling in time over Table-Look-Up KMC
 Requires small subset of PE surface information.
 Case: Constitutive Behavior of Aluminum AA7055
 AA 7055 found strain-rate dependence without ‘a priori’ knowledge.
 Case: Reparamaterization of Semi-Empirical Potentials
 Multiobjective GA yields accurate potentials.
 Can GP’s help with better forms of potential?
 This is POTENTIALLY the most exciting applications area.
7-8 November 2005
Website: http://www.mcc.uiuc.edu
©Board of Trustees University of Ilinois
Materials Computation Center
University of Illinois Urbana-Champaign
Funded by NSF DMR 03-25939
Future Work
 Algorithm Development:
 Competent operators to handle complex interactions
 Mathematical analysis of GP:
 Population-sizing and convergence-time models
 Engineering & Scientific Application:
 More complex systems: Adatoms, line and planar defects
 Application in excitation chemistry (Forms of potentials?)
 Algorithm Efficiency Enhancement:
 Parallelization of GP
 Hybridize GP with cluster-expansion methods
 Reduce the configurations that need PE calculation

More Related Content

Viewers also liked

Mining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial DatasetMining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial Datasetbutest
 
Leikir sem kennsluaðferð
Leikir sem kennsluaðferðLeikir sem kennsluaðferð
Leikir sem kennsluaðferðbutest
 
RFP document template
RFP document templateRFP document template
RFP document templatebutest
 
22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهان22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهانbutest
 
22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهان22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهانbutest
 
Product Overview
Product OverviewProduct Overview
Product Overviewbutest
 
Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedbutest
 
Supporting the composition of Effective Virtual Groups
Supporting the composition of Effective Virtual GroupsSupporting the composition of Effective Virtual Groups
Supporting the composition of Effective Virtual Groupsbutest
 
University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...
University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...
University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...butest
 
Business Strategy of Hyundai with respect to a Fuel Cell car in ...
Business Strategy of Hyundai with respect to a Fuel Cell car in ...Business Strategy of Hyundai with respect to a Fuel Cell car in ...
Business Strategy of Hyundai with respect to a Fuel Cell car in ...butest
 

Viewers also liked (11)

Mining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial DatasetMining Regional Knowledge in Spatial Dataset
Mining Regional Knowledge in Spatial Dataset
 
DOC
DOCDOC
DOC
 
Leikir sem kennsluaðferð
Leikir sem kennsluaðferðLeikir sem kennsluaðferð
Leikir sem kennsluaðferð
 
RFP document template
RFP document templateRFP document template
RFP document template
 
22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهان22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهان
 
22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهان22-1388.docx - دانشکده پزشکی اصفهان
22-1388.docx - دانشکده پزشکی اصفهان
 
Product Overview
Product OverviewProduct Overview
Product Overview
 
Ch 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-basedCh 9-1.Machine Learning: Symbol-based
Ch 9-1.Machine Learning: Symbol-based
 
Supporting the composition of Effective Virtual Groups
Supporting the composition of Effective Virtual GroupsSupporting the composition of Effective Virtual Groups
Supporting the composition of Effective Virtual Groups
 
University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...
University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...
University of Hyderabad Vacancies* *http://www.uohyd. ernet.in ...
 
Business Strategy of Hyundai with respect to a Fuel Cell car in ...
Business Strategy of Hyundai with respect to a Fuel Cell car in ...Business Strategy of Hyundai with respect to a Fuel Cell car in ...
Business Strategy of Hyundai with respect to a Fuel Cell car in ...
 

Similar to Accurate Quantum Chemistry via Machine-Learning and ...

Exascale Computing Project Update
Exascale Computing Project UpdateExascale Computing Project Update
Exascale Computing Project Updateinside-BigData.com
 
Foothill College Energy Program
Foothill College Energy ProgramFoothill College Energy Program
Foothill College Energy ProgramRobert Cormia
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)inside-BigData.com
 
Supercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUsSupercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUsCan Ozdoruk
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible researchYannick Wurm
 
The U.S. Exascale Computing Project: Status and Plans
The U.S. Exascale Computing Project: Status and PlansThe U.S. Exascale Computing Project: Status and Plans
The U.S. Exascale Computing Project: Status and Plansinside-BigData.com
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impactinside-BigData.com
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...
Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...
Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...Environmental Intelligence Lab
 
Products go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product LinesProducts go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product LinesGreenLabAtDI
 
Source resizing and improved power distribution for high available island mic...
Source resizing and improved power distribution for high available island mic...Source resizing and improved power distribution for high available island mic...
Source resizing and improved power distribution for high available island mic...Mohamed Ghaieth Abidi
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Publiclab13unisa
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Projectinside-BigData.com
 
A simulation based multi-objective design optimization of electronic packages...
A simulation based multi-objective design optimization of electronic packages...A simulation based multi-objective design optimization of electronic packages...
A simulation based multi-objective design optimization of electronic packages...Phuong Dx
 
Static Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachStatic Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachGreenLabAtDI
 

Similar to Accurate Quantum Chemistry via Machine-Learning and ... (20)

Exascale Computing Project Update
Exascale Computing Project UpdateExascale Computing Project Update
Exascale Computing Project Update
 
ECP Application Development
ECP Application DevelopmentECP Application Development
ECP Application Development
 
Foothill College Energy Program
Foothill College Energy ProgramFoothill College Energy Program
Foothill College Energy Program
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
 
Research on Blue Waters
Research on Blue WatersResearch on Blue Waters
Research on Blue Waters
 
Supercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUsSupercharging MD Simulations with GPUs
Supercharging MD Simulations with GPUs
 
2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
The U.S. Exascale Computing Project: Status and Plans
The U.S. Exascale Computing Project: Status and PlansThe U.S. Exascale Computing Project: Status and Plans
The U.S. Exascale Computing Project: Status and Plans
 
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...
Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...
Curses, tradeoffs, and scalable management: advancing evolutionary direct pol...
 
Multiscale Modeling
Multiscale ModelingMultiscale Modeling
Multiscale Modeling
 
AI for Science
AI for ScienceAI for Science
AI for Science
 
Products go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product LinesProducts go Green: Worst-Case Energy Consumption in Software Product Lines
Products go Green: Worst-Case Energy Consumption in Software Product Lines
 
Source resizing and improved power distribution for high available island mic...
Source resizing and improved power distribution for high available island mic...Source resizing and improved power distribution for high available island mic...
Source resizing and improved power distribution for high available island mic...
 
Mpp Rsv 2008 Public
Mpp Rsv 2008 PublicMpp Rsv 2008 Public
Mpp Rsv 2008 Public
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Project
 
A simulation based multi-objective design optimization of electronic packages...
A simulation based multi-objective design optimization of electronic packages...A simulation based multi-objective design optimization of electronic packages...
A simulation based multi-objective design optimization of electronic packages...
 
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...CLIM: Transition Workshop - Optimization Methods in Remote Sensing  - Jessica...
CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...
 
Static Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachStatic Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario Approach
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Accurate Quantum Chemistry via Machine-Learning and ...

  • 1. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Multi-Timescale Modeling and Quantum Chemistry using Machine-Learning Methods via Genetic Programs MCC Internal Review University of Illinois 7 November 2005 Faculty: Duane D. Johnson (MSE), Pascal Bellon (MSE), David Goldberg (GE), Todd Martinez (Chemistry) Students: Kumara Sastry (MSE/GE), Alexis Thompson (Chemistry), Jia Ye (MSE) See also Poster
  • 2. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Background on Multi-Time-Scale Modeling  Growing interest in multi-timescale modeling  Restrictive and do not yield required speed-up  Hyperdynamics, Parallel replica (Voter, 1997,1998)  Focus on infrequent events  Use transition state theory  MD + KMC (Jacobsen, Cooper, & Sethna, 1998)  Elemental metals, tabulated activation barriers  Hybridize MD & KMC using genetic programming  Calculate some activation barriers using MD  Predict others using GP
  • 3. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Objective for Multi-Timescale Kinetic Modeling  Can we simulate experimental time scales for dynamics (up to seconds) for designing nanostuctured functional materials? –Time of realistic processes requires atomic-scale information, need frequent events (pico-secs) to rare events (secs). –Infeasible to compute, e.g., barriers a priori or “on-the-fly”. – Possible configurations become potentially innumerable. – relative barrier heights control access and diffusion.  Propose a novel, effective & practical method based on Genetic Programming (GP) for the intelligent machine learning of vast number of barrier values.  Offer Proof-of-Concept for long-time atomic diffusion in alloy – a hybrid of MD: nanoseconds 10–9 secs and (Kinetic MC): seconds. –Use MD to get some diffusion barriers. –Use KMC to span 15 orders of time!, but need all barriers. –Use GP to regress all barriers from some barrier info. –Savings compared to Table Look-up is 4-8+ orders of magnitude.
  • 4. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Time evolution of realistic processes requires atomic-scale data, but scales inaccessible via atomic simulations (only nano-seconds!). Ex: Vacancy-assisted Migration at Cu50Co50 (001) Surface using Kinetic Monte Carlo • Simulate seconds from KMC but need all barriers! (2nd n.n.: 8192 barriers!) • Machine-learn barriers as regressed in-line function E(c0,x) from few barriers . • GP needs < 3% of the barriers for < 0.1% error! • CPU savings (compared to Database look-up) is 4-8+ orders of magnitude. RESULT: Time Multiscaling to Seconds using Genetic Programming n.n. jumps: 1st 2nd database K. Sastry, DD Johnson, DE Golderg, P. Bellon, Phys. Rev. B 72, 085438 (2005). Chosen by AIP editors as frontier research for the Virtual J. of Nanoscience (Aug, 2005)
  • 5. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Objective for Using Multi-Objective Genetic Algorithms in Quantum Chemistry  Can we utilize concepts from multi-objective optimization theory (Pareto fronts) to create ab initio accurate semiemprical quantum chemistry potentials to dramatic speed-up searches over reaction pathways?  Offer Proof-of-Concept for Benzene and Ethylene.  Future: Propose using Genetic Programming (GP) to machine-learn semiempirical potential form to improve reliability and speed.
  • 6. GA-MNDO-PM3 CASPT2 RESULT: S1/S2 Conical Intersection from Machine-Learned MNDO vs. ab initio CASSCF Reaction Coordinate 0.0 0.2 0.4 0.6 0.8 1.0 Relative Energy(ev) 0 2 4 6 8 10 FC → S2/S1 Intersection Dashed lines = target CASPT2 results only values/gradients at x=0 included in GA fitting Minimal Energy Intersections – Expected to play a prominent role in excited-state chemistry (nonadiabatic transitions). Red and Blue are g/h vectors – displacements which lift electronic degeneracy. Benzene
  • 7. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Genetic Programming Optimization for Multi-Timescale Modeling Kinetic Diffusion. Cu-Co Vacancy-Assisted Surface Migration
  • 8. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 IlliGAL at University of Illinois Urbana-Champaign http://www-illigal.ge.uiuc.edu/ Studying nature's search algorithm of choice, genetics and evolution, as a practical approach to solving difficult problems on a computer. The mechanics of a genetic algorithm (GA) are conceptually simple: (1) maintain a population of solutions coded as chromosomes, (2) select the better solutions for recombination (crossover) of mating chromosomes. (3) perform mutation and other variation operators on the chromosomes, and (4) use these offspring to replace poorer solutions or to create a new generation. Theory and empirical results demonstrate that GAs lead to improved solutions in many problem domains, and well-designed GAs can be guaranteed to solve a broad class of provably hard problems, quickly, reliably, and accurately. David Goldberg (General Engineering)
  • 9. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 If gradient is numerically precise and fast, use gradient algorithm. If there is an exact ground state, do not use genetic algorithm. GA Advantages • no need for knowledge or gradient info about object or energy surface. • discontinuities present on surface have little effect on optimization. • resistant to becoming trapped in local minima. • work well on large-scale optimization problems. • can be used on a wide variety of problems. GA Disadvantages • trouble finding exact global minimum. • require a large number of cost function evaluations. • Starting/setting up configurations is not straightforward. • GAs require more evaluations to move uphill. Generally, using GAs depends on Problem
  • 10. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Recall Concept of Genetic Algorithms  Search based on principles of natural selection and genetics  Gene encode solution: e.g., binary xn =0 or 1 for a variable where possible solutions are “gene” sequence {x1,…xN}.  Fitness (Objective) function: Quality measure of sequence.  Need known function to minimize (or maximize) to evaluate quality of gene sequence, e.g., min. f(x) = | “cost” + “constraints”|, or max. f–1 (x).  Population: A set of candidate gene sequences (solutions).  Genetic operators:  Selection: “Survival of the fittest”  Recombination: Combine parental traits to create offspring  Mutation: Modify an offspring slightly (local gene sequence change)
  • 11. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Example Use of Gas for Regression Ian Walmsley and Herschel Rabitz, “Quantum Physics Under Control,” Physics Today, August 2003. Example: Controlling laser shape pulses to increase yields in molecular reactions. • Controller is based on a GA to shape (phase and amplitude) pulses. • GA evolves trying to maximize mass spectrometer signal of desired molecular species. • thousands of pulse updated per second. R.J. Levis and H. Rabitz, J. Phys. Chem. A 106, 6427 (2002). Ψ ~ ckeikr−ωt∑
  • 12. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Same Concept for Genetic Programming  Genetic Algorithms that evolve computer programs (Koza, 1992)  Representation: Programs are represented by trees  Functions: Internal nodes (eg., {+, –, *, sin, cosh, log})  Terminals: Leaf nodes (eg., {x, y, 2.3, R})  R=random ephemeral constant.  Fitness function: Quality measure of the program  Population: Candidate programs (individuals)  Genetic operators:  Selection: “Survival of the fittest”  Recombination: Combine parental traits to create offspring  Mutation: Modify an offspring slightly
  • 13. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Example GP Function and Tree  Define Functions: { x, y, z, sin, +, *, ^, ADF1, …}  ADF “automatically defined functions” can be learned.  Function Example: Add Double f_add (arg1, arg2) { return arg1 + arg2; }  Example TREE: where x,y,z ∈U=[a,b] Internal node=fct Leaf node
  • 14. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Genetic Programming (GP)
  • 15. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939  Efficient Coupling of MD and KMC  Multi-timescale modeling of alloys  Predict entire potential energy (PE) surface using a few exact PE calculations  PE surface =  Don’t know the functional from of f ! e.g., not simple basis, could be product of plane-wave & polynomial  Use symbolic functional regression via GP  Search for the regression function, f  Optimize the values of coefficients, co  Form of intelligent machine learning (like Bayesian, neural nets, etc.) Approach for Surface Diffusion Coefficients Total alloy configuration http://gold.cchem.berkeley.edu:8080/research
  • 16. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 GP for Predicting PE Surface  Encode: Alloy configuration ⇒ unique number  Decision variables:  Function set: {+, -, *, /, exp, sin, ^}  Terminal set:  Binary A-B alloy: xi = {0,1}: xi is (is not) A for neighbors (e.g., 1st and 2nd ) occupation for vacancy/atom pair costing ∆E.  Atomistic: Compute ∆E for some subset of configurations  Object Function: Minimize absolute error in prediction Energy barrier predicted by GP Weights (More importance to lower energy barriers)
  • 17. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939  Molecular Dynamics: F = ma  Real dynamics but for short times (~10–9 secs).  Many realistic processes are inaccessible.  Must run long. But can calculate anything.  Monte Carlo: Sampling, “dumb and blind” method.  Acceptance probability pi,f = min[1, exp(–∆Ei,f/kBT)]  Need to calculation each ∆Ei,f= barrier height to change states.  Time-evolution: not real time (Monte Carlo step, MCS), unless MD or experiment has provide relation of MCS to real time for all events.  Kinetic Monte Carlo: assumes Poisson process, know all ∆Ei,f.  Hence frequency of events (rate) relative to most frequent (known) event with smallest time span ∆tshortest, e.g. MD or experiment.  KMC (~secs) but need all jump frequencies a priori.  KMC steps in REAL TIME and ALL EVENTS accepted! Quick Overview of the MD, MC, KMC Methods
  • 18. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Generic Time Enhancements of Table KMC  GP-predicted PES facilitates use of kinetic Monte Carlo  Real time in KMC (Fichthorn & Weinberg, 1991)  Speed-up over MD:  ~109 at 300 K  ~105 at 550 K  ~103 at 900 K  Less CPU time over MD –
  • 19. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Potential Energy Surface: Vacancy-assisted Migration for fcc Elemental and Binary Alloy  Computational Method  Potentials: – Empirical: Morse – Quantum: Tight-binding  System size: 5 layers, >>100 atoms/layer  Consider 1st and 2nd nearest-neighbor (n.n.) jumps  Local (active) configs.: 1st and 2nd n.n. environments  Consider rigid (fully relaxed) atoms to calculate ∆E  Energy Test: pure Cu, n.n. jumps only  Morse potential: ∆E = 0.39 eV (present work)  ab initio : ∆E=0.42±0.08 eV  EAM : ∆E=0.47±0.05 eV  TB : ∆E=0.45±0.05 eV  Complex case: Segregating fcc CuxCo1-x x y n.n. jumps: 1st 2nd x z Fixed layers Co Cu Vacancy 1st 2nd n.n. configs.: }Boisvert & Lewis, Phys. Rev. B. 56 (1997) (Present work) 1st n.n. 2nd n.n. 1st n.n. configs. 128 128 2nd n.n. configs. 2048 8192 Total configs. >>2100 >>2100 JumpsActive
  • 20. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 GP Optimized Regression for PE Surface: (001) Vacancy-assisted Migration The Machine-Learned In-Line Barrier Fct.
  • 21. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 GP Optimized Regression for PE Surface: (001) Vacancy-assisted Migration While Non-Linear Function is complicated, you do not care – give accurate barriers, otherwise it has no meaning!
  • 22. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 PES Predictions: (001) Vacancy-assisted Migration  1st n.n. active configuration (128 total) for simplicity  Atoms are either rigid or fully relaxed.  Simple regression fails: Quadratic (Cubic) Polynomial Regression needs 27% (78%) of the configurations  GP needs PE calculation for only 20 configurations (or 16%)
  • 23. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939  Total 2nd n.n. active configurations: 8192  GP needs (∆E calculated): < 3% (256) configurations  Low energy migrations: < 0.1% prediction error  Overall events: < 1% error PES Predictions: (001) Vacancy-assisted Migration
  • 24. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939  From a few (as needed) exact calculations use symbolic functional regression via GP to predict the entire PE surface from an in-line function f(c0,x).  Search for the regression function, f , optimize the values of coefficients, co  Form of intelligent machine learning (like Bayesian, neural nets, etc.) Symbolically-Regressed KMC (sr-KMC)
  • 25. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Time Enhancements from sr-KMC over standard Table KMC  GP needs (∆E calculated): – <3% (256) configurations (33 times fewer barrier). – Using Cluster-expansion techniques 0.3% (330 times fewer)!  Low energy migrations: < 0.1% prediction error.  GP yields in-line barrier function: ~100 x faster than table look-up.  Compared to “on-the-fly” calculations, sr-KMC is 104 -107 faster! –in-linefunctioncall~10–3 secs per barrier. – Empirical potential ~10 secs per barrier. – Tight-binding potential ~1800 secs per barrier. – first-principles potential even greater.  How does gain scale with complexity? – For present problem, the number of barriers required decreases with complexity of configuration space. PROMISING!
  • 26. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Multi-Objective GA Optimization of Semi-Empirical Quantum Chemistry Potentials with ab initio accuracy. BENZENE
  • 27. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Benzene Reparameterization  Target data: Ab Initio values via CASSCF(6/6) (Toniolo et al 2004)  ground (S0) and excited-state energies (S1, optical dark, and S2, allowed) (planar benzene, Dewar Benzene, benzvalene, prefluvene).  excited-state gradients at these points.(Franck-Condon region)  Semi-empirical potential: MNDO-PM3 (Stewart,1989)  11 Carbon parameters to optimize (fix hydrogen or core-core repulsion) Uss,Upp (Coulombic); Gss, Gsp, Gpp, Gpp` (repulsion); Hsp (exchange); βs, βp (resonance); ζs, ζp (Slater orbital exponents) -Modified Neglect of Differential Overlap  Objectives: #1: Error in excited-state energies and geometries #2: Error in excited-state gradients
  • 28. Dewar (b) 1.335 [1.348] (1.347) 1.481 [1.523] (1.530) 116.1º [116.8º] (116.5º) 1.606 [1.601] (1.587) Prefulvene (c) 1.371 [1.401] (1.404) 1.458 [1.496] (1.501) 1.533 [1.524] (1.546) 177.7º [136.0º] (136.3º) S1-S0 CI (d) 1.422 [1.466] (1.462) 1.367 [1.395] (1.397) 170.2º [131.4º] [129.7º] 1.857 [1.942] (1.945) (1.387) (1.476) Benzvalene (a) 1.340 [1.335] (1.350) 1.469 [1.509] (1.512) 1.430 [1.481] (1.457) 1.529 [1.498] (1.526) 1.407 [1.475] (1.441) 1.359 [1.372] (1.408) S2-S1 CI (e) 2.395 [2.454] (2.456) 1.384 [1.461] (1.458) Figure 1. Ground state optimized geometries and important minimal energy conical intersections for benzene. Bond lengths (Å) and angles (degrees) from RPM3, CASSCF(6/6)/6-31* (in parentheses), and CASSCF(6/6)*PT2/6-31* optimizations (in brackets) are shown.
  • 29. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 •Semi-empirical potential: MNDO-PM3 (Stewart,1989) –11 Carbon parameters to optimize •Objectives: #1: Error in excited-state energies and geometries #2: Error in excited-state gradients • Not obvious how to weight accuracy in energy compared to accuracy in gradients. • Multi-objective GA with bias solves the problem! • Use non-dominated sorting GA II (NSGA-II) (Deb et al., 1999) –Competence and efficiency can be further enhanced by data mining important problem substructures (building blocks) Benzene Reparameterization
  • 30. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Reparameterization of Semi-Empirical Potentials: Multiobjective Optimization Approach *O O O  Simultaneously obtain set of non- dominated (Pareto optimal) solutions in parallel.  Avoid potentially irrelevant and unphysical pathways, arising from SE-forms.  Reparameterization of SE-forms involves multiple objectives fit of limited set of ab initio energies, geometries, and energy gradients.  Previous Approach: Sequential weighted local optimization  Yields unphysical potentials, Results in local optima, Depends entirely on the weights on different objectives Multiobjective optimization
  • 31. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Genetic Algorithm Multiobjective Optimization  Unlike single-objective problems, multi-objective problems involve a set of Pareto-optimal solutions.  Notion of Non-Dominating Solutions • A dominates C. • A and B are non-dominant.  Solution X dominates Y if:  X is no worse than Y in all objectives  X is strictly better than Y in at least one objective
  • 32. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Why use Multiobjective Genetic Algorithms?  Robust search algorithms that yield good quality solutions quickly, reliably, and accurately  Rapidly converge to the Pareto optimal front  Maintain as diverse a distribution of solutions as possible  Population approach suits well to find multiple solutions  Niche-preservation methods can be exploited to find diverse solutions  Implicit parallelism helps provide a parallel search  Multiple applications of classical methods do not constitute a parallel search
  • 33. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Multiobjective GA Results: Unbiased vs. Biased  Bias = Weight error in energy 2x more than in energy gradient.  (Un)Biased solutions are consistently better than the published result. Tonilo et al (2004)  Pareto-optimal solutions are physical!  37% lower gradient error  33% lower energy error  Biasing: - convergence 2-3 times faster - improves solution quality - finds physical solutions.
  • 34. GA Biased CASPT2 S1/S2 Conical Intersection Reaction Coordinate 0.0 0.2 0.4 0.6 0.8 1.0 Relative Energy(ev) 0 2 4 6 8 10 FC → S2/S1 Intersection Dashed lines = target CASPT2 results (only values and gradients at x=0 included in GA fitting) Minimal Energy Intersections – Expected to play a prominent role in excited state chemistry (nonadiabatic transitions). Red and Blue are g/h vectors – displacements which lift electronic degeneracy.
  • 36. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Potentially creates “transferable” potentials Benzene parameters compares to MO-GA for Ethelyne C2H4
  • 37. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Mathematical Analysis of GP  Population size  Very important parameter for GP performance  Currently no guidance to choose population size  Building-Block Supply Analysis  What population-size should be used?
  • 38. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 For Ethelyne ~800 solutions needed for reliable
  • 39. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Summary  Symbolic regression via GP holds promise for application to numerous areas of science and engineering.  GP mathematical analysis required to determine adequate population sizes, etc.  Case: Surface-vacancy assisted migration in CuxCo1-x  Dramatic scaling in time over Table-Look-Up KMC  Requires small subset of PE surface information.  Case: Constitutive Behavior of Aluminum AA7055  AA 7055 found strain-rate dependence without ‘a priori’ knowledge.  Case: Reparamaterization of Semi-Empirical Potentials  Multiobjective GA yields accurate potentials.  Can GP’s help with better forms of potential?  This is POTENTIALLY the most exciting applications area.
  • 40. 7-8 November 2005 Website: http://www.mcc.uiuc.edu ©Board of Trustees University of Ilinois Materials Computation Center University of Illinois Urbana-Champaign Funded by NSF DMR 03-25939 Future Work  Algorithm Development:  Competent operators to handle complex interactions  Mathematical analysis of GP:  Population-sizing and convergence-time models  Engineering & Scientific Application:  More complex systems: Adatoms, line and planar defects  Application in excitation chemistry (Forms of potentials?)  Algorithm Efficiency Enhancement:  Parallelization of GP  Hybridize GP with cluster-expansion methods  Reduce the configurations that need PE calculation