Data Pipelining and Workflow Management for Materials Science Applications

Data Pipelining and Workflow
Management for Materials
Science Applications

Dr George Fitzgerald
Dr Mathew Halls
Dr Jacob Gavartin
Dr Gerhard Goldbeck-Wood
Accelrys, Inc.

Overview

• Modeling overview
• Workflow automation
• Examples
– PEM Fuel Cell Catalysts
– Lithium Ion Battery Additives
– OLEDs
– Metallocenes
• Evolutionary optimization algorithms
• Summary

© 2008 Accelrys, Inc. 2

The Concept of Modeling:
Computational Physics and Chemistry

• Computational Physics and Chemistry simulate structures, processes and properties
numerically, based fully or in part on fundamental principles of physics

• Some methods may be used to model not only stable molecules but also short-lived,
unstable intermediates and even transition states.

• Computational Physics and Chemistry are vital adjuncts to experimental studies

• Roles of modeling today
– Run through many scenarios quickly and easily
– Visualize results and share information
– A common platform for expert and non-expert

Virtual Experiments

Issues that simulation can address…

• Reactions, bond formation and breaking Quantum Mechanics
• Miscibility, solubility…
• Diffusion, permeation, membrane transport… Classical
• Adhesion (i.e., interactions with surfaces)
• Crystallization and polymorphism
• Micelle or vesicle formation and properties
• Emulsions, kinetics and properties
• Polymeric microspheres, release profiles
Mesoscale

Increasing Size & Complexity


High-Throughput Computation

• Goal:
– Use computation to assist in the rapid discovery of new materials

• Why High-Throughput Computation (HTC)?
– Brute force: screen more materials
– Make life easier: reduce human effort and human error
– Be clever: with enough results you can start to see trends, make broad predictions

• We want to do these calculations as rapidly as possible
• Available tools
– Predict properties from first principles (or derived from first principles)
– Create phenomenological models based on modeling + experiment (QSAR)
– Statistical analysis of experimental and/or computational results: predictive analytics


Components of an HTC System

• Good hardware
– Fast chips = less time per calculation
– Many cores = more simultaneous calculations

• Good predictive methods
– Accurate methods like DFT, molecular mechanics, or mesoscale models
– Rapid methods like QSAR: GFA, NN, Recursive partitioning

• Workflow automation tools
– Create complex, multistep calculations
– Manage job submission and analysis
– Create summary of results
– Compare to experiment


Automated Chemical Modeling

• Workflow management tools capture complex modeling workflows into an automated
workflow for calculation and analysis of materials systems
• Essential tasks include
– Running simulations (MM, Semiempirical, QM, etc.)
– Manipulation of chemical structures
– Arithmetical manipulation of results
– Integration of multiple data sources (analytical instruments, modeling, publications)
– Statistical analysis of results (QSAR, clustering)
– Reports & graphs
– Pipelining, i.e., using output from one component as input to the next


Materials Discovery and Optimization using Virtual
Screening

Chemical Virtual Automated
Motif Library QC
Design Enumeration Calculation

Identification Virtual
of optimum Materials
leads Library /
Database

Experimental
Analysis
screening


QSAR in the Design of Materials

• Some properties are easy to calculate, e.g.,
– Structure
– Heat of formation
– HOMO-LUMO gap
• But the properties that easy are not always the ones we want
– Corrosion resistance
– Catalyst lifetime
– Tg
• QSAR gives us a way to estimate the difficult properties based on the ones that we
can calculate easily and quickly
• QSAR procedure
– Get experimental results (or accurate computation)
– Compute “descriptors”
– Create a statistical model that can predict the target properties
– Use the model to predict the results for “virtual samples”
• Examples
– Cytotoxic activities of platinum complexes, J. Comput. Aided Mol. Des. 23 (2009) 343.
– Corrosion Inhibitors, Progress in Organic Coatings 61 (2008) 11.
– Metal-organic frameworks for hydrogen storage, Cat. Today 120 (2007) 317.


Uses of Workflow Automation

• Programs like Pipeline Pilot provide drag-and-drop method for building workflows
• Some calculations require multiple steps
– IP: ground state optimization + single cation energy
– pKa: vacuum and solvated calculations of protonated and de-protonated species
• Generation of starting structures
X X Z Z X
– Combinatorial libraries X X X X

– Defects O
R4
X X X X X
O Z
– Surfaces R3
X X X
X
X Z
O R2
X
X X

• Summary and reporting R1 Z
X z1
X X
X = F or H


Simple Workflow Example: Adiabatic & Vertical IP

• Calculating Vertical IP:
– Geometry optimize neutral
– Single-point energy of cation

Energy
• Calculating Adaibatic IP:
– Geometry optimize neutral
λ+/-
– Geometry optimize cation
• Workflow simplifies and automates these 3
calculations and presents results in table,
Ma+/- Mb Ma Mb+/-
spreadsheet, database…
Reaction Coordinate


PEM Fuel Cells Challenges

• iCatDesign project used combined theory and
O2 + 2 H 2 → 2 H 2O + electricity experiment to find new catalysts for oxygen
activation in fuel cells
– Johnson Matthey
– CMR Fuel Cells
– Accelrys
– Co-funded by the UK Technology Strategy Board's
Collaborative Research and Development
programme
• One challenging step is Oxygen Reduction
Reaction (ORR)
• Pt is effective catalyst for activating O2 but too
expensive for large-scale application
– How can we find catalysts that are just as effective
but less expensive?
– High-throughput DFT calculations with CASTEP

• Recently published:
– Gavartin, et al., ECS Transactions 25, 1335-1344
(2009)
Anode: 2 H 2 → 4 H + + 4e −
Cathode: O2 + 4 H + + 4e − → 2 H 2O


Adsorption and activation energies: ORR

E
E0=E(O2+*)

ETS=E(O*-O*)

E1=E(O2*)
E2=2E(O*)

Reaction coordinate
• ORR activity needs the adsorption energy just
right
– To loose → no activation
– To tight → no desorption
• Activity would improve if Eads were a bit less
than in pure Pt
• Expansion and contraction of Pt lattice leads to
changes in Eads

© 2008 Accelrys, Inc. iCatDesign
13

Reducing Computational Cost

• This work examined alloys of the form A3B, e.g., Pt3Co
• Use 5 layer model with lowest layers fixed
– In 3 layer model, there are 220 unique structures
– For 2xA and 10xB elements > 2,000 calculations
• Need ORR activation for each
– How can we avoid 2,000 DFT TS searches?

• We can estimate activity with Eads
• Observation: d-band center is roughly linear with Eads
• Reduction in computational cost:
– ORR barrier (TS optimization)

– Eads (constrained geometry optimization)

– d-band center


Summary of HTC for CASTEP Calculations

• Many low-lying structures for each A3B
– Computation of Eads requires ensemble average
– Automation provides tremendous simplification to this process
• CASTEP Component simplifies and automates setup & analysis of multiple jobs
• Pt3Co identified as lead alloy
• Next steps:
– Submit lead compounds to calculations of Eads
– Submit best results to TS calculations
– Submit best results for experimental screening
– Use computation to validate experimental results
• E.g., confirm experimental structures via Raman
– Use experimental results to refine the QSAR model


Lithium Ion Batteries and SEI Film Formation

• The electrolyte typically consists of one or more lithium salts dissolved in
an aprotic solvent with at least one additional functional additive



• The electrolyte typically consists of one or more lithium salts dissolved in
an aprotic solvent with at least one additional functional additive
• Additives are included in electrolyte formulations to increase the
dielectric strength and enhance electrode stability by facilitating the
formation of the solid/electrolyte interface (SEI) layer



1 e- decomposition
scheme

• Initiation step leading to anode SEI formation is electron transfer to the
SEI forming species
– Results in decomposition reaction
– Produces the passivating SEI layer
• Important requirements for electrolyte additives selected to facilitate
good SEI formation are:
– Higher reduction potential than the base solvent (low LUMO)
– Maximal reactivity for a given chemical design space (low hardness η)
– Large dipole moment for interaction with Li (high µ)


Anode SEI Additive Structure Library

X X Z Z X
X X X X
X X X X X
R4
O
O X
Z
X X
X
R3
X Z
O R2
X
X X

R1 Z
X z1
X X
X = F or H

• Cyclic carbonates, related to ethylene carbonate (EC), are often used as
anode SEI additives for use with graphite anodes
• To explore the effect of alkylation or fluorination on EC-based additive
properties an R-Group based enumeration scheme was used to generate a
EC-based additive structure library (7381 stereochemically unique
structures)

Anode SEI Additive Results

• Optimal materials must satisfy a number of objectives
• Multi-objective solutions represent a trade-off between objectives
• One approach is to adopt the “Pareto-optimal” solution
– Set of solutions such that is not possible to improve one property without
making any other property worse
– This case:
• Minimize the chemical hardness
• Maximize the dipole moment and electron affinity


3D View of Pareto Surface


Anode SEI Additive Pareto Optimal Candidate

• Optimal materials solutions are systems
that simultaneously satisfying a number
of target objectives
• Multiobjective solutions represent a trade-
off between objectives, with one class
being Pareto-optimal solutions
• Pareto-optimal solutions are defined as a
set of solutions which are non-dominated,
such that is not possible to improve one
property without making any other
property worse
• For anode SEI additives optimal solutions
seek to minimize the LUMO energy,
maximize the dipole moment and
minimize the chemical hardness
• Screening the EC-based additive library
gives structure 1573 as a Pareto-optimal
1573
solution (R1=R2=CH3 and R3=R4=c-C3F5)


Organic Light Emitting Diode (OLED) Basics

AlQ3
Simple 2 Layer OLED Device Structure

Cathode

Electron-Transport Layer (ETL)
Hole-Transport Layer (HTL)
ITO Glass Substrate

HTL ETL
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.

NPB

Cathode
Anode


AlQ3 Electron Transport and Emitting Material

Experimental λmax for Derivatized AlQ3
Materials (Al(QX)3)
• Following Tang and Van
Slyke’s pioneering work1,
AlQ3 has become the
archetype OLED material
• Optoelectronic properties can HOMO LUMO
be tuned by derivatizing AlQ3
with electron-withdrawing or
electron-donating
substituents
Al(QX)3
• Al(QX)3 have been Group 1-CH3 2-CH3 2-F 2-Cl 2-CN
experimentally demonstrating ∆λmax -10 nm +31 nm +15 nm +10 nm -3 nm
that R1/R2 substituents affect
the electronic and optical
properties2 1 Tang, C. W.; VanSlyke, S. A. Appl. Phys. Lett. 1987, 51, 913.
2 Chen, C. H.; Shi, J. Coord. Chem. Rev. 1998, 171, 161.


Virtual Library Enumeration in SES

• Virtual library enumeration has played a major
role in computational drug design
• Similar approaches, using RGroup-based or
Reaction-based, enumeration schemes can be
used to generate virtual libraries of materials
which can be analysed, screened and filtered to
identify and explore:
– Lead material candidates
– Material property trends and SPRs
• The enumeration components in the ‘Chemistry
Component Collection’ on the SES platform
enables automated library generation which can
be store as a file or directly pipelined into an
analysis workflow
• A virtual library of 8436 Al(QX2)3 structures were
generated combining the 6 substituents studied
experimentally over the 2 reaction sites per ligand
on the AlQ3 core


Al(QX2)3 Library

8436 Structures

OLED Pipelined QC Workflow

• Pipeline employing the using the
PM3 Hamiltonian through the VAMP
component was constructed to
compute:
– Total Energies
– HOMO and LUMO Energies

Energy
– Vertical & Adiabatic Ionization
Potential (IP) λ+/-
– Vertical & Adiabatic Electron Affinity
(EA)
• Charge transport through weakly
interacting monomeric materials is Ma+/- Mb Ma Mb+/-
outer sphere electron transfer and is
described by Marcus theory Reaction Coordinate

• Characteristic Energies were also
computed: • A random percent filter was used to
– Hole Reorganization Energy (λ+)
sample the Al(QX2)3 structure library
– Electron Reorganization Energy (λ-)
and >1000 structures were analyzed
through the OLED QC protocol


OLED Pipelined QC Workflow Results



• Al(QX2)3 properties can be tailored through changes in molecular structure
– LUMO energy and Electron Reorganization Energy vary over ranges of ca. 1.25 and 2.25
eV
• Analysis of the Reorg E Difference (Elec Reorg E - Hole Reorg E) shows that changes
in structure can switch the preferred transport from electron to hole


• Al(QX2)3 library with QC computed properties can be screened for optimal candidates
• Superior ETL OLED materials should be stable and preferentially conduct electrons
• Library can be Pareto sorted to simultaneously minimize the ‘Heat of Formation’ and
‘Electron Reorg E’ to identify lead structures

Modeling the Activity of Polymerization Catalysts

• Metallocenes are known as effective catalysts for
polymerization
• Alter ligands for control of
– Activity
– Molecular weight of polymer
– Tacticity of polymer
• QM can predict reliable reaction rates, but…
– Time consuming
– TS difficult to automate
• How do we make modeling more efficient and
more amenable to automation?
– Develop QSAR models
– Screen many, many structures with QSAR
– Perform time-consuming QM on only the most
promising leads
– Perform experiments on only the best QM results

Metallocene data from Albert J van Reenen,
http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html

Details of QSAR & GFA

• Choice of descriptors:
– “Fast descriptors”
• Topological descriptors
• Information content descriptors
– QM descriptors with VAMP (PM6 or AM1-d)
• Charge on metal atoms
• Fukui index on metal atoms
– Structural
• “Bite angle”
Bite angle
• Choice of compounds
– 31 structures with experimental data
• Model
– GFA with linear splines
– 6 term equation

Metallocene images and data from Albert J van Reenen,
http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html


Genetic Function Algorithm (GFA)

• Genetic function algorithm (GFA) yields analytical models
• GFA finds the best function and fewest descriptors
– It is possible to identify the importance of each descriptor
– Produces a family of results, not just a single equation
• Analytical expression can include:
– Linear terms a * xi
– Quadratic terms a * xi2
– Cross terms a * xi * xj
– Splines <xi – a>
• Example:
– Catalyst Activity = -23.4
+ 2.04 * [Treatment Time]
– 0.016 * [Fe2O3%]
+ 0.256 * [PtO %]
– 0.0224 * [Al2O3%] * [Cr2O3 %]


GFA Results

• Summary of GFA equations
• Display of predicted vs. actual


Using the GFA for Combinatorial Catalysis

• Framework: 4 choices
• Metal: 3 choices
• R1, R2, R3: 6 choices
• Approx 1,300 calculations
• Procedure
– Generate combinatorial library
– Compute descriptors (charges, bite angle, etc.)
– Use GFA model to predict catalyst performance
– Take best leads and use QM to predict more accurately

• Advantages
– Easier than manual approach
– Faster than doing exact QM TS on everything
– Find trends in the performance of different R groups


Evolutionary Optimization

• Genetic Function Algorithm (GFA) produces an analytical expression but
how do we find the extrema?
• Approach 1: Brute force
– Generate the combinatorial grid of data and look for maximum and minimum
– For each molecule compute descriptors then evaluate activity with GFA
– Not a bad approach if you have the CPU resources
• Approach 2: Genetic Algorithm (GA)
– GA can be compared to the evolution of DNA
– An initial population is randomly constructed
– The “best” individuals are allowed to propagate
– Positive traits passed to next generation


Applications of GA to Materials Discovery

• Metallocene catalysts
– Located optimum in ~400 calculations (1,300 possible)

• Battery additives
– Located optimum in ~500 calculations (7,300 possible)

• H2 storage nanoclusters
– Dope Mg13 with Li and B
– Total 1,590,000 structures
– Work in progress: predict most stable nanocluster by GA


Metallocene Optimization by GA

• Framework: 4 choices
• Metal: 3 choices
• R1, R2, R3: 6 choices
• Generate random population of 20 individuals
• Compute descriptors (charges, bite angle, etc.)
• Use GFA model to predict catalyst performance
• Take best results and allow them to evolve

• Advantages
– Automated
– Faster (usually) than exhaustive search
• Disadvantages
– In danger of becoming a ‘black box’


Summary

• The generation of virtual structure libraries can be used to explore
materials design space
• Automation and data pipelining are key to HTC
– Eliminate tedium
– Reduce human error
– Allow a greater number of samples to be screened
• Larger number of results brings into play statistical methods for
finding trends
• Approximate methods like QSAR are valuable for reducing the
number of expensive calculations
• Evolutionary algorithms like GA make it possible to automate the
discover process, not just the computational process


Acknowledgements

• Collaborator for Li additive project: Ken Tasaki,
– Technology Research Division, Mitsubishi Chemical Inc., Redondo
Beach, CA 90277
• Computational resources for HTC: Hewlett-Packard
• iCatDesign project sponsored by Technology Strategy Board
Project Number: /5/MAT/6/I/H0379C


Data Pipelining and Workflow Management for Materials Science Applications

Recommended

Recommended

More Related Content

Similar to Data Pipelining and Workflow Management for Materials Science Applications

Similar to Data Pipelining and Workflow Management for Materials Science Applications (20)

More from BIOVIA

More from BIOVIA (20)

Recently uploaded

Recently uploaded (20)

Data Pipelining and Workflow Management for Materials Science Applications