Data Pipelining and Workflow
   Management for Materials
        Science Applications

                       Dr George Fi...
Overview

• Modeling overview
• Workflow automation
• Examples
   – PEM Fuel Cell Catalysts
   – Lithium Ion Battery Addit...
The Concept of Modeling:
Computational Physics and Chemistry



• Computational Physics and Chemistry simulate structures,...
Issues that simulation can address…



• Reactions, bond formation and breaking                    Quantum Mechanics
• Mis...
High-Throughput Computation



• Goal:
   – Use computation to assist in the rapid discovery of new materials


• Why High...
Components of an HTC System



• Good hardware
   – Fast chips = less time per calculation
   – Many cores = more simultan...
Automated Chemical Modeling



• Workflow management tools capture complex modeling workflows into an automated
  workflow...
Materials Discovery and Optimization using Virtual
Screening


     Chemical               Virtual       Automated
       ...
QSAR in the Design of Materials


• Some properties are easy to calculate, e.g.,
   – Structure
   – Heat of formation
   ...
Uses of Workflow Automation



• Programs like Pipeline Pilot provide drag-and-drop method for building workflows
• Some c...
Simple Workflow Example: Adiabatic & Vertical IP



• Calculating Vertical IP:
   – Geometry optimize neutral
   – Single-...
PEM Fuel Cells Challenges

                                                     • iCatDesign project used combined theory ...
Adsorption and activation energies: ORR


                                             E
                                 ...
Reducing Computational Cost



   • This work examined alloys of the form A3B, e.g., Pt3Co
   • Use 5 layer model with low...
Summary of HTC for CASTEP Calculations



• Many low-lying structures for each A3B
   – Computation of Eads requires ensem...
Lithium Ion Batteries and SEI Film Formation




• The electrolyte typically consists of one or more lithium salts dissolv...
Lithium Ion Batteries and SEI Film Formation




• The electrolyte typically consists of one or more lithium salts dissolv...
Lithium Ion Batteries and SEI Film Formation



                                                        1 e- decomposition...
Anode SEI Additive Structure Library




                                      X   X    Z               Z   X
            ...
Anode SEI Additive Results




• Optimal materials must satisfy a number of objectives
• Multi-objective solutions represe...
3D View of Pareto Surface




© 2008 Accelrys, Inc.       21
Anode SEI Additive Pareto Optimal Candidate



• Optimal materials solutions are systems
  that simultaneously satisfying ...
Organic Light Emitting Diode (OLED) Basics



                                                                            ...
AlQ3 Electron Transport and Emitting Material


                                         Experimental λmax for Derivatized...
Virtual Library Enumeration in SES



• Virtual library enumeration has played a major
  role in computational drug design...
Al(QX2)3 Library




   8436 Structures
© 2008 Accelrys, Inc.   26
OLED Pipelined QC Workflow


     • Pipeline employing the using the
     PM3 Hamiltonian through the VAMP
        compone...
OLED Pipelined QC Workflow Results




© 2008 Accelrys, Inc.                28
OLED Pipelined QC Workflow Results




• Al(QX2)3 properties can be tailored through changes in molecular structure
   – L...
OLED Pipelined QC Workflow Results




• Al(QX2)3 library with QC computed properties can be screened for optimal candidat...
Modeling the Activity of Polymerization Catalysts


• Metallocenes are known as effective catalysts for
  polymerization
•...
Details of QSAR & GFA



 • Choice of descriptors:
    – “Fast descriptors”
       • Topological descriptors
       • Info...
Genetic Function Algorithm (GFA)



• Genetic function algorithm (GFA) yields analytical models
• GFA finds the best funct...
GFA Results



• Summary of GFA equations
• Display of predicted vs. actual




© 2008 Accelrys, Inc.               34
Using the GFA for Combinatorial Catalysis



• Framework: 4 choices
• Metal: 3 choices
• R1, R2, R3: 6 choices
• Approx 1,...
Evolutionary Optimization



• Genetic Function Algorithm (GFA) produces an analytical expression but
  how do we find the...
Applications of GA to Materials Discovery



• Metallocene catalysts
   – Located optimum in ~400 calculations (1,300 poss...
Metallocene Optimization by GA



• Framework: 4 choices
• Metal: 3 choices
• R1, R2, R3: 6 choices
• Generate random popu...
Summary



    • The generation of virtual structure libraries can be used to explore
      materials design space
    • A...
Acknowledgements



    • Collaborator for Li additive project: Ken Tasaki,
       – Technology Research Division, Mitsubi...
Upcoming SlideShare
Loading in …5
×

Data Pipelining and Workflow Management for Materials Science Applications

1,208 views

Published on

Workshop in computational methods for materials science, presented at Spring 2010 ACS conference. This workshop illustrates how high-throughput computation and automation can be used with quantum chemistry calculations to solve problems in materials discovery. Examples include catalysts, fuel cells, OLEDs.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,208
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Pipelining and Workflow Management for Materials Science Applications

  1. 1. Data Pipelining and Workflow Management for Materials Science Applications Dr George Fitzgerald Dr Mathew Halls Dr Jacob Gavartin Dr Gerhard Goldbeck-Wood Accelrys, Inc.
  2. 2. Overview • Modeling overview • Workflow automation • Examples – PEM Fuel Cell Catalysts – Lithium Ion Battery Additives – OLEDs – Metallocenes • Evolutionary optimization algorithms • Summary © 2008 Accelrys, Inc. 2
  3. 3. The Concept of Modeling: Computational Physics and Chemistry • Computational Physics and Chemistry simulate structures, processes and properties numerically, based fully or in part on fundamental principles of physics • Some methods may be used to model not only stable molecules but also short-lived, unstable intermediates and even transition states. • Computational Physics and Chemistry are vital adjuncts to experimental studies • Roles of modeling today – Run through many scenarios quickly and easily – Visualize results and share information – A common platform for expert and non-expert Virtual Experiments © 2008 Accelrys, Inc. 3
  4. 4. Issues that simulation can address… • Reactions, bond formation and breaking Quantum Mechanics • Miscibility, solubility… • Diffusion, permeation, membrane transport… Classical • Adhesion (i.e., interactions with surfaces) • Crystallization and polymorphism • Micelle or vesicle formation and properties • Emulsions, kinetics and properties • Polymeric microspheres, release profiles Mesoscale Increasing Size & Complexity © 2008 Accelrys, Inc. 4
  5. 5. High-Throughput Computation • Goal: – Use computation to assist in the rapid discovery of new materials • Why High-Throughput Computation (HTC)? – Brute force: screen more materials – Make life easier: reduce human effort and human error – Be clever: with enough results you can start to see trends, make broad predictions • We want to do these calculations as rapidly as possible • Available tools – Predict properties from first principles (or derived from first principles) – Create phenomenological models based on modeling + experiment (QSAR) – Statistical analysis of experimental and/or computational results: predictive analytics © 2008 Accelrys, Inc. 5
  6. 6. Components of an HTC System • Good hardware – Fast chips = less time per calculation – Many cores = more simultaneous calculations • Good predictive methods – Accurate methods like DFT, molecular mechanics, or mesoscale models – Rapid methods like QSAR: GFA, NN, Recursive partitioning • Workflow automation tools – Create complex, multistep calculations – Manage job submission and analysis – Create summary of results – Compare to experiment © 2008 Accelrys, Inc. 6
  7. 7. Automated Chemical Modeling • Workflow management tools capture complex modeling workflows into an automated workflow for calculation and analysis of materials systems • Essential tasks include – Running simulations (MM, Semiempirical, QM, etc.) – Manipulation of chemical structures – Arithmetical manipulation of results – Integration of multiple data sources (analytical instruments, modeling, publications) – Statistical analysis of results (QSAR, clustering) – Reports & graphs – Pipelining, i.e., using output from one component as input to the next © 2008 Accelrys, Inc. 7
  8. 8. Materials Discovery and Optimization using Virtual Screening Chemical Virtual Automated Motif Library QC Design Enumeration Calculation Identification Virtual of optimum Materials leads Library / Database Experimental Analysis screening © 2008 Accelrys, Inc. 8
  9. 9. QSAR in the Design of Materials • Some properties are easy to calculate, e.g., – Structure – Heat of formation – HOMO-LUMO gap • But the properties that easy are not always the ones we want – Corrosion resistance – Catalyst lifetime – Tg • QSAR gives us a way to estimate the difficult properties based on the ones that we can calculate easily and quickly • QSAR procedure – Get experimental results (or accurate computation) – Compute “descriptors” – Create a statistical model that can predict the target properties – Use the model to predict the results for “virtual samples” • Examples – Cytotoxic activities of platinum complexes, J. Comput. Aided Mol. Des. 23 (2009) 343. – Corrosion Inhibitors, Progress in Organic Coatings 61 (2008) 11. – Metal-organic frameworks for hydrogen storage, Cat. Today 120 (2007) 317. © 2008 Accelrys, Inc. 9
  10. 10. Uses of Workflow Automation • Programs like Pipeline Pilot provide drag-and-drop method for building workflows • Some calculations require multiple steps – IP: ground state optimization + single cation energy – pKa: vacuum and solvated calculations of protonated and de-protonated species • Generation of starting structures X X Z Z X – Combinatorial libraries X X X X – Defects O R4 X X X X X O Z – Surfaces R3 X X X X X Z O R2 X X X • Summary and reporting R1 Z X z1 X X X = F or H © 2008 Accelrys, Inc. 10
  11. 11. Simple Workflow Example: Adiabatic & Vertical IP • Calculating Vertical IP: – Geometry optimize neutral – Single-point energy of cation Energy • Calculating Adaibatic IP: – Geometry optimize neutral λ+/- – Geometry optimize cation • Workflow simplifies and automates these 3 calculations and presents results in table, Ma+/- Mb Ma Mb+/- spreadsheet, database… Reaction Coordinate © 2008 Accelrys, Inc. 11
  12. 12. PEM Fuel Cells Challenges • iCatDesign project used combined theory and O2 + 2 H 2 → 2 H 2O + electricity experiment to find new catalysts for oxygen activation in fuel cells – Johnson Matthey – CMR Fuel Cells – Accelrys – Co-funded by the UK Technology Strategy Board's Collaborative Research and Development programme • One challenging step is Oxygen Reduction Reaction (ORR) • Pt is effective catalyst for activating O2 but too expensive for large-scale application – How can we find catalysts that are just as effective but less expensive? – High-throughput DFT calculations with CASTEP • Recently published: – Gavartin, et al., ECS Transactions 25, 1335-1344 (2009) Anode: 2 H 2 → 4 H + + 4e − Cathode: O2 + 4 H + + 4e − → 2 H 2O © 2008 Accelrys, Inc. 12
  13. 13. Adsorption and activation energies: ORR E E0=E(O2+*) ETS=E(O*-O*) E1=E(O2*) E2=2E(O*) Reaction coordinate • ORR activity needs the adsorption energy just right – To loose → no activation – To tight → no desorption • Activity would improve if Eads were a bit less than in pure Pt • Expansion and contraction of Pt lattice leads to changes in Eads © 2008 Accelrys, Inc. iCatDesign 13
  14. 14. Reducing Computational Cost • This work examined alloys of the form A3B, e.g., Pt3Co • Use 5 layer model with lowest layers fixed – In 3 layer model, there are 220 unique structures – For 2xA and 10xB elements > 2,000 calculations • Need ORR activation for each – How can we avoid 2,000 DFT TS searches? • We can estimate activity with Eads • Observation: d-band center is roughly linear with Eads • Reduction in computational cost: – ORR barrier (TS optimization) – Eads (constrained geometry optimization) – d-band center © 2008 Accelrys, Inc. 14
  15. 15. Summary of HTC for CASTEP Calculations • Many low-lying structures for each A3B – Computation of Eads requires ensemble average – Automation provides tremendous simplification to this process • CASTEP Component simplifies and automates setup & analysis of multiple jobs • Pt3Co identified as lead alloy • Next steps: – Submit lead compounds to calculations of Eads – Submit best results to TS calculations – Submit best results for experimental screening – Use computation to validate experimental results • E.g., confirm experimental structures via Raman – Use experimental results to refine the QSAR model © 2008 Accelrys, Inc. 15
  16. 16. Lithium Ion Batteries and SEI Film Formation • The electrolyte typically consists of one or more lithium salts dissolved in an aprotic solvent with at least one additional functional additive © 2008 Accelrys, Inc. 16
  17. 17. Lithium Ion Batteries and SEI Film Formation • The electrolyte typically consists of one or more lithium salts dissolved in an aprotic solvent with at least one additional functional additive • Additives are included in electrolyte formulations to increase the dielectric strength and enhance electrode stability by facilitating the formation of the solid/electrolyte interface (SEI) layer © 2008 Accelrys, Inc. 17
  18. 18. Lithium Ion Batteries and SEI Film Formation 1 e- decomposition scheme • Initiation step leading to anode SEI formation is electron transfer to the SEI forming species – Results in decomposition reaction – Produces the passivating SEI layer • Important requirements for electrolyte additives selected to facilitate good SEI formation are: – Higher reduction potential than the base solvent (low LUMO) – Maximal reactivity for a given chemical design space (low hardness η) – Large dipole moment for interaction with Li (high µ) © 2008 Accelrys, Inc. 18
  19. 19. Anode SEI Additive Structure Library X X Z Z X X X X X X X X X X R4 O O X Z X X X R3 X Z O R2 X X X R1 Z X z1 X X X = F or H • Cyclic carbonates, related to ethylene carbonate (EC), are often used as anode SEI additives for use with graphite anodes • To explore the effect of alkylation or fluorination on EC-based additive properties an R-Group based enumeration scheme was used to generate a EC-based additive structure library (7381 stereochemically unique structures) © 2008 Accelrys, Inc. 19
  20. 20. Anode SEI Additive Results • Optimal materials must satisfy a number of objectives • Multi-objective solutions represent a trade-off between objectives • One approach is to adopt the “Pareto-optimal” solution – Set of solutions such that is not possible to improve one property without making any other property worse – This case: • Minimize the chemical hardness • Maximize the dipole moment and electron affinity © 2008 Accelrys, Inc. 20
  21. 21. 3D View of Pareto Surface © 2008 Accelrys, Inc. 21
  22. 22. Anode SEI Additive Pareto Optimal Candidate • Optimal materials solutions are systems that simultaneously satisfying a number of target objectives • Multiobjective solutions represent a trade- off between objectives, with one class being Pareto-optimal solutions • Pareto-optimal solutions are defined as a set of solutions which are non-dominated, such that is not possible to improve one property without making any other property worse • For anode SEI additives optimal solutions seek to minimize the LUMO energy, maximize the dipole moment and minimize the chemical hardness • Screening the EC-based additive library gives structure 1573 as a Pareto-optimal 1573 solution (R1=R2=CH3 and R3=R4=c-C3F5) © 2008 Accelrys, Inc. 22
  23. 23. Organic Light Emitting Diode (OLED) Basics AlQ3 Simple 2 Layer OLED Device Structure Cathode Electron-Transport Layer (ETL) Hole-Transport Layer (HTL) ITO Glass Substrate HTL ETL The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again. NPB Cathode Anode © 2008 Accelrys, Inc. 23
  24. 24. AlQ3 Electron Transport and Emitting Material Experimental λmax for Derivatized AlQ3 Materials (Al(QX)3) • Following Tang and Van Slyke’s pioneering work1, AlQ3 has become the archetype OLED material • Optoelectronic properties can HOMO LUMO be tuned by derivatizing AlQ3 with electron-withdrawing or electron-donating substituents Al(QX)3 • Al(QX)3 have been Group 1-CH3 2-CH3 2-F 2-Cl 2-CN experimentally demonstrating ∆λmax -10 nm +31 nm +15 nm +10 nm -3 nm that R1/R2 substituents affect the electronic and optical properties2 1 Tang, C. W.; VanSlyke, S. A. Appl. Phys. Lett. 1987, 51, 913. 2 Chen, C. H.; Shi, J. Coord. Chem. Rev. 1998, 171, 161. © 2008 Accelrys, Inc. 24
  25. 25. Virtual Library Enumeration in SES • Virtual library enumeration has played a major role in computational drug design • Similar approaches, using RGroup-based or Reaction-based, enumeration schemes can be used to generate virtual libraries of materials which can be analysed, screened and filtered to identify and explore: – Lead material candidates – Material property trends and SPRs • The enumeration components in the ‘Chemistry Component Collection’ on the SES platform enables automated library generation which can be store as a file or directly pipelined into an analysis workflow • A virtual library of 8436 Al(QX2)3 structures were generated combining the 6 substituents studied experimentally over the 2 reaction sites per ligand on the AlQ3 core © 2008 Accelrys, Inc. 25
  26. 26. Al(QX2)3 Library 8436 Structures © 2008 Accelrys, Inc. 26
  27. 27. OLED Pipelined QC Workflow • Pipeline employing the using the PM3 Hamiltonian through the VAMP component was constructed to compute: – Total Energies – HOMO and LUMO Energies Energy – Vertical & Adiabatic Ionization Potential (IP) λ+/- – Vertical & Adiabatic Electron Affinity (EA) • Charge transport through weakly interacting monomeric materials is Ma+/- Mb Ma Mb+/- outer sphere electron transfer and is described by Marcus theory Reaction Coordinate • Characteristic Energies were also computed: • A random percent filter was used to – Hole Reorganization Energy (λ+) sample the Al(QX2)3 structure library – Electron Reorganization Energy (λ-) and >1000 structures were analyzed through the OLED QC protocol © 2008 Accelrys, Inc. 27
  28. 28. OLED Pipelined QC Workflow Results © 2008 Accelrys, Inc. 28
  29. 29. OLED Pipelined QC Workflow Results • Al(QX2)3 properties can be tailored through changes in molecular structure – LUMO energy and Electron Reorganization Energy vary over ranges of ca. 1.25 and 2.25 eV • Analysis of the Reorg E Difference (Elec Reorg E - Hole Reorg E) shows that changes in structure can switch the preferred transport from electron to hole © 2008 Accelrys, Inc. 29
  30. 30. OLED Pipelined QC Workflow Results • Al(QX2)3 library with QC computed properties can be screened for optimal candidates • Superior ETL OLED materials should be stable and preferentially conduct electrons • Library can be Pareto sorted to simultaneously minimize the ‘Heat of Formation’ and ‘Electron Reorg E’ to identify lead structures © 2008 Accelrys, Inc. 30
  31. 31. Modeling the Activity of Polymerization Catalysts • Metallocenes are known as effective catalysts for polymerization • Alter ligands for control of – Activity – Molecular weight of polymer – Tacticity of polymer • QM can predict reliable reaction rates, but… – Time consuming – TS difficult to automate • How do we make modeling more efficient and more amenable to automation? – Develop QSAR models – Screen many, many structures with QSAR – Perform time-consuming QM on only the most promising leads – Perform experiments on only the best QM results Metallocene data from Albert J van Reenen, http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html © 2008 Accelrys, Inc. 31
  32. 32. Details of QSAR & GFA • Choice of descriptors: – “Fast descriptors” • Topological descriptors • Information content descriptors – QM descriptors with VAMP (PM6 or AM1-d) • Charge on metal atoms • Fukui index on metal atoms – Structural • “Bite angle” Bite angle • Choice of compounds – 31 structures with experimental data • Model – GFA with linear splines – 6 term equation Metallocene images and data from Albert J van Reenen, http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html © 2008 Accelrys, Inc. 32
  33. 33. Genetic Function Algorithm (GFA) • Genetic function algorithm (GFA) yields analytical models • GFA finds the best function and fewest descriptors – It is possible to identify the importance of each descriptor – Produces a family of results, not just a single equation • Analytical expression can include: – Linear terms a * xi – Quadratic terms a * xi2 – Cross terms a * xi * xj – Splines <xi – a> • Example: – Catalyst Activity = -23.4 + 2.04 * [Treatment Time] – 0.016 * [Fe2O3%] + 0.256 * [PtO %] – 0.0224 * [Al2O3%] * [Cr2O3 %] © 2008 Accelrys, Inc. 33
  34. 34. GFA Results • Summary of GFA equations • Display of predicted vs. actual © 2008 Accelrys, Inc. 34
  35. 35. Using the GFA for Combinatorial Catalysis • Framework: 4 choices • Metal: 3 choices • R1, R2, R3: 6 choices • Approx 1,300 calculations • Procedure – Generate combinatorial library – Compute descriptors (charges, bite angle, etc.) – Use GFA model to predict catalyst performance – Take best leads and use QM to predict more accurately • Advantages – Easier than manual approach – Faster than doing exact QM TS on everything – Find trends in the performance of different R groups © 2008 Accelrys, Inc. 35
  36. 36. Evolutionary Optimization • Genetic Function Algorithm (GFA) produces an analytical expression but how do we find the extrema? • Approach 1: Brute force – Generate the combinatorial grid of data and look for maximum and minimum – For each molecule compute descriptors then evaluate activity with GFA – Not a bad approach if you have the CPU resources • Approach 2: Genetic Algorithm (GA) – GA can be compared to the evolution of DNA – An initial population is randomly constructed – The “best” individuals are allowed to propagate – Positive traits passed to next generation © 2008 Accelrys, Inc. 36
  37. 37. Applications of GA to Materials Discovery • Metallocene catalysts – Located optimum in ~400 calculations (1,300 possible) • Battery additives – Located optimum in ~500 calculations (7,300 possible) • H2 storage nanoclusters – Dope Mg13 with Li and B – Total 1,590,000 structures – Work in progress: predict most stable nanocluster by GA © 2008 Accelrys, Inc. 37
  38. 38. Metallocene Optimization by GA • Framework: 4 choices • Metal: 3 choices • R1, R2, R3: 6 choices • Generate random population of 20 individuals • Compute descriptors (charges, bite angle, etc.) • Use GFA model to predict catalyst performance • Take best results and allow them to evolve • Advantages – Automated – Faster (usually) than exhaustive search • Disadvantages – In danger of becoming a ‘black box’ © 2008 Accelrys, Inc. 38
  39. 39. Summary • The generation of virtual structure libraries can be used to explore materials design space • Automation and data pipelining are key to HTC – Eliminate tedium – Reduce human error – Allow a greater number of samples to be screened • Larger number of results brings into play statistical methods for finding trends • Approximate methods like QSAR are valuable for reducing the number of expensive calculations • Evolutionary algorithms like GA make it possible to automate the discover process, not just the computational process © 2008 Accelrys, Inc. 39
  40. 40. Acknowledgements • Collaborator for Li additive project: Ken Tasaki, – Technology Research Division, Mitsubishi Chemical Inc., Redondo Beach, CA 90277 • Computational resources for HTC: Hewlett-Packard • iCatDesign project sponsored by Technology Strategy Board Project Number: /5/MAT/6/I/H0379C © 2008 Accelrys, Inc. 40

×