Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Atomate: a high-level interface to generate, execute, and analyze computational materials science workflows
1. Atomate: A High-level Interface to Generate, Execute, and
Analyze Computational Materials Science Workflows
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Lab
Berkeley, CA
TMS 2018
Slides (already) posted to: https://hackingmaterials.lbl.gov/
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
2. 2
A schematic of “materials genomics” approaches to
materials science
data
applications
methods
(theory,
ML)
software
implementation
3. 3
Our group builds and maintain several
open-source software libraries
Data generation Data analysis
run and manage millions of computational
tasks over large computing resources
library of FireWorks-compatible workflows
for materials science applications
materials data retrieval, featurization,
and visualization for machine learning
tools for crystal manipulation, data
analysis, and simulation software I/O
*led by Ong group, UCSD
tools for inverse optimation / adaptive design –
ML chooses what calculations to run
4. 4
This talk will focus on atomate and FireWorks
Data generation Data analysis
run and manage millions of computational
tasks over large computing resources
library of FireWorks-compatible workflows
for materials science applications
materials data retrieval, featurization,
and visualization for machine learning
tools for crystal manipulation, data
analysis, and simulation software I/O
*led by Ong group, UCSD
tools for inverse optimation / adaptive design –
ML chooses what calculations to run
5. Today, automated (“high-throughput”) calculations play an
important role in materials data generation
5
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.!
6. Today, automated (“high-throughput”) calculations play an
important role in materials data generation
6
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.!
Atomate’s goal: make
it easy to generate
comparable data sets
on your own
7. A “black-box” view of performing a calculation
7
“something”!
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
8. Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
8
lots of tedious,
low-level work…!
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
9. What would be a better way?
9
“something”!
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
10. What would be a better way?
10
Results!!
researcher!
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run!
q band structure!
q surface energies!
ü elastic tensor!
q Raman spectrum!
q QH thermal expansion!
11. Ideally the method should scale to millions of calculations
11
Results!!
researcher!
Start with all binary
oxides, replace O->S,
run several different
properties
Workflows to run!
ü band structure!
ü surface energies!
ü elastic tensor!
q Raman spectrum!
q QH thermal expansion!
q spin-orbit coupling!
12. Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
12
Results!!
researcher!
Run many different
properties of many
different materials!
13. Atomate contains a library of simulation procedures
13
VASP-based
• band structure
• spin-orbit coupling
• hybrid functional
calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• NEB
• GIBBS method
• QH thermal
expansion
• AIMD
• ferroelectric
• surface adsorption
• work functions
Other
• BoltzTraP
• FEFF method
• LAMMPS MD
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
14. Each simulation procedure translates high-level instructions
into a series of low-level tasks
14
quickly and automatically translate PI-style (minimal)
specifications into well-defined FireWorks workflows
What is the
GGA-PBE elastic
tensor of GaAs?
M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al.,
Charting the complete elastic properties of inorganic crystalline compounds,
Sci. Data. 2 (2015).
15. Atomate thus encodes and standardizes knowledge about
running various kinds of simulations from domain experts
15
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
16. 16
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
17. 17
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
18. • Pymatgen can retrieve crystal
structures from the Materials
Project database (MPRester class)
• It can also manipulate crystal
structures
– substitutions
– supercell creation
– order-disorder (shown at right)
– interstitial finding
– surface / slab generation
• A visual interface to many of the
tools are in Materials Project’s
“Crystal Toolkit” app
18
Crystal structure generation via pymatgen
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
19. 19
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
20. 20
Atomate’s main goal – convert structures to workflows
Workflows consist of a series of jobs (“FireWorks”), each
with multiple tasks. Atomate jobs typically (i) run a
calculation and (ii) store the results in a database
21. Workflow parameters can be customized at
multiple levels of detail
21
1. Workflows have
various high-level
options
2. Fireworks also
have options / flags
(not shown)
3. Firetasks have
most detailed
number of options /
flags (not shown)
Example 1: “VASP input set” controls
the rules that set DFT parameters
(pseudopotentials, cutoffs, grid
densities, etc) via pymatgen!
!
Example II: If “stability_check” is
enabled, the later parts of the workflow
are skipped if the structure is
determined unstable.!
22. You can build workflows from scratch or reuse components
to assemble workflows
Multiple workflows are built with the same
components stacked together in different ways like
Legos
22
These two workflows reuse almost
all the same code between the
two!
23. 23
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
24. • Once you have the material and the simulation procedure (Workflow),
you need to actually execute the workflow on your computing resource
• This includes tasks like:
– submission to calculation queues
– customization of any computing-specific parameters
• e.g., path to VASP executable, number of CPUs to parallelize over
– recovering from failures / job resubmission
– coordinating jobs across computing centers
– managing location of jobs
– tracking the progress of jobs
• Almost all of this is handled by FireWorks (custodian is used for
encoding fixes to typical errors e.g. VASP ZPOTRF error)
• FireWorks is a mature software, used by dozens of research groups and
used to to run millions of simulations
24
Calculation execution with FireWorks
25. FireWorks allows you to write your workflow once and
execute (almost) anywhere
25
• Execute workflows
locally or at a
supercomputing
center
• Queue systems
supported
– PBS
– SGE
– SLURM
– IBM LoadLeveler
– NEWT (a REST-based
API at NERSC)
– Cobalt (Argonne LCF)
27. Job provenance and automatic metadata storage
27
what machine
what time
what directory
what was the output
when was it queued
when did it start running
when was it completed
28. Detect and rerun failures
• All kinds of failures can be detected and rerun
– Soft failures (job quits with error code)
– hard failures (computing center goes down)
– human errors
28
29. “Dynamic workflows” let you program
intelligent, reactive workflows
29
Xiaohui can replace himself with
digital Xiaohui,
programmed into
FireWorks
30. Customize job priorities
• Within workflow, or between workflows
• Completely flexible and can be modified /
updated whenever you want
30
31. 31
Full operation diagram
job 1
job 2
job 3 job 4
structure! workflow! database of
all workflows!
automatically submit + execute!output files + database!
33. 33
The atomate database makes it easy to perform various
analyses with pymatgen
atomate output
database(s)!
phase
diagrams
Pourbaix
diagrams
diffusivity via MDband structure analysis
34. 34
Many research groups have run tens of thousands of
materials science workflows with atomate
also used by:
• Persson research group, UC Berkeley
• Ong research group, UC San Diego
• Neaton research group, UC Berkeley
• Liu research group, Penn State
• Groups not developing on atomate!
• e.g., see “Thermal expansion of quaternary nitride coatings” by
Tasnadi et al.
atomate now powers the Materials
Project and will be used to run
hundreds of thousands of
simulations in the next year
(www.materialsproject.org)
35. • Link to code:
– https://www.github.com/hackingmaterials/atomate
• License: BSD
– open-source, can be used with commercial software
– like MIT license but clause to not misuse the Berkeley Lab
name, e.g. for advertising purposes
• Help and support
– https://groups.google.com/forum/#!forum/atomate
• Citation with further information:
– Mathew, K. et al. Atomate: A high-level interface to
generate, execute, and analyze computational materials
science workflows. Comput. Mater. Sci. 139, 140–152
(2017).
35
Further information on atomate
36. Thank you!
• Kiran Mathew
• Joey Montoya
• Alireza Faghaninia
• Shyam Dwaraknath
• Murat Aykol
• Hanmei Tang
• Iek-Heng Chu
• Tess Smidt
• Brandon Bocklund
• Matthew Horton
• John Dagdelen
• Brandon Wood
• Zi-Kiu Liu
• Jeff Neaton
• Shyue Ping Ong
• Kristin Persson
• all other atomate
contributors!
36
Slides (already) posted to https://hackingmaterials.lbl.gov/