Software tools for calculating materials properties in high-throughput (pymatgen, atomate, FireWorks)

Anubhav Jain
Anubhav JainBerkeley Lab
Software tools for calculating materials
properties in high-throughput
(pymatgen, atomate, FireWorks)
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Lab
Berkeley, CA
MP workshop 2017
Slides (already) posted to: http://www.slideshare.net/anubhavster
2
“Civilization advances by extending the number of
important operations which we can perform
without thinking about them.”
- Alfred North Whitehead
A “black-box” view of performing a calculation
3
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
4
lots of tedious,
low-level work…!
Results!!
researcher!
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?	
Input	file	flags	
SLURM	format	
how	to	fix	ZPOTRF?
Calculations are labor intensive!
•  Some of the things to do include:
–  set up the structure coordinates
–  write input files, double-check all the flags
–  copy to supercomputer
–  submit job to queue
–  deal with supercomputer headaches
–  monitor job
–  fix error jobs, resubmit to queue, wait again
–  repeat process for subsequent calculations in workflow
–  parse output files to obtain results
–  copy and organize results, e.g., into Excel
•  This is often repeated for each and every run!
5
All the low-level steps can lead to errors!
Let’s take a look at two alternate universes:
It is too easy to end up in
the wrong timeline! 6
you! have coffee!
copy files from!
previous simulation!
edit 5 lines!
run simulation,!
analyze data!
forget coffee!
copy files from!
previous simulation!
edit 4 lines!
but forget!
LHFCALC=F!
run simulation, !
looks fine at first, !
in a month you !
discover it was wrong!
1
2
you!
Today, it is difficult to learn and apply several
computational procedures due to steep learning curve
Because of the multiple low-level steps that all need to be
done correctly to apply computational methods, there is
often a single group “expert” for each technique
7
“Alice knows how to do charged defect calculations.”!
“Bob is the one who can properly converge GW runs.”!
“Olga has all the scripts for phonon calculations.”!
What would be a better way?
8
“something”!
Results!!
researcher!
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?
What would be a better way?
9
Results!!
researcher!
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?	
a button!
MPComplete on Materials Project works this way
10
Input generation
(parameter choice)
Workflow mapping Supercomputer
submission /
monitoring
Error
handling
File Transfer
File Parsing /
DB insertion
Custom material
Submit!
www.materialsproject.org
“Crystal Toolkit”
Anyone can find, edit,
and submit (suggest)
structures
Currently, this feature is available for:
•  structure optimization
•  band structures
•  elastic tensors
What if you want to do something more complicated?
11
•  Start with all the known binary oxides
•  Substitute oxygen with sulfur
•  Re-optimize the structure and compute the
band structure
•  Plot the band gap of oxide vs the sulfide
•  Rerun compounds with heavy elements with
spin-orbit coupling enabled
Now, a simple button won’t do it anymore.
What you want is a high-level language for defining,
running, and executing simulations.
With such a capability, one could do many applications.
Example: The Materials Project database
12
Jain*, Ong*, Hautier, Chen, Richards, Dacek, Cholia, Gunter, Skinner, Ceder, and
Persson, APL Mater., 2013, 1, 011002. *equal contributions!
The Materials Project (http://www.materialsproject.org)
free and open
>30,000 registered users
around the world
>65,000 compounds
calculated
Data includes
•  thermodynamic props.
•  electronic band structure
•  aqueous stability (E-pH)
•  elasticity tensors
•  piezoelectric tensors
>75 million CPU-hours
invested = massive scale!
Example: The Electrolyte Genome
13
data on ~22,000 molecules
(mainly geometry + IP/EA via
full adiabatic calcs)
Also deployed on the
Materials Project web site
L. Cheng, R.S. Assary, X. Qu, A. Jain, S.P. Ong, N.N. Rajput, et al.,
J. Phys. Chem. Lett. 6 (2015) 283–291.!
!
X. Qu, A. Jain, N.N. Rajput, L. Cheng, Y. Zhang, S.P. Ong, et al.,
Comput. Mater. Sci. 103 (2015) 56–67.!
Example: Crystalium (Ong / Persson)
14
http://crystalium.materialsvirtuallab.org
surface energies for 142 polymorphs of
72 elements + rotatable Wulff shapes
computed & maintained by the Ong
group (UC San Diego) with support
from Persson Group (UC Berkeley)
R. Tran, Z. Xu, B. Radhakrishnan, D. Winston, W. Sun,
K. A. Persson, and S. P. Ong, Sci. Data, 2016, 3, 160080.!
Example: Rapid data generation
15
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier, Sci
Data 2017, 4, 170085.!
How can we design a simple yet powerful
tool for performing simulations?
•  First, we’ll go through the four high-level steps in doing
a theory study
•  In each step, we’ll talk about how software tools can
help make you more productive and help you
concentrate on things that require your brainpower, not
mindless labor (e.g., manually resubmitting failed jobs)
16
Steps for theory &
simulation
17
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)!
simulation parameters
and sequence (workflow)!
job management,
execution, error recovery!
databases, plotting,
analysis, insight!
Software technologies for theory / calculation
18
	
(automatic materials
science workflows)
Custodian	
(calculation error
recovery)
	
(materials analysis
framework)
Base packages Derived packages
	
(workflow definition &
execution)
These are all open-source:
	
(materials data mining)
Software stack for
theory & simulation
19
material
generation
simulation
procedure
calculation
&
execution
data
analysis
Custodian
Steps for theory &
simulation
20
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)!
simulation parameters
and sequence (workflow)!
job management,
execution, error recovery!
databases, plotting,
analysis, insight!
Step 1: materials
generation
21
material
generation
•  In yesterday’s tutorial, we already
saw how pymatgen could help
generate models for:
–  crystal structures
–  molecules
–  systems (surfaces, interfaces, etc.)
•  Tools include:
–  order-disorder (shown at right)
–  interstitial finding
–  surface / slab generation
–  get structures from Materials Project
•  Won’t cover materials generation
in this presentation!
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
Steps for theory &
simulation
22
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)!
simulation parameters
and sequence (workflow)!
job management,
execution, error recovery!
databases, plotting,
analysis, insight!
Step 2: simulation
procedure
23
simulation
procedure
•  Simulation procedure requires knowing:
i.  what sequence of calculations do I need to perform to compute my
desired materials property?
ii.  for each calculation, what is a good set of parameters to use (e.g.,
functional, energy cutoff, k-point density, etc.)
•  Item (i) is largely handled by atomate
•  Item (ii) is largely handled by pymatgen, e.g. in the VaspInputSet
classes and won’t be covered here – although atomate also
contains some of this information
•  Overall, we are trying to standardize and automate simulation
procedures as much as possible
Atomate knows the sequence of calculations needed to
compute many kinds of materials properties
24
quickly and automatically translate PI-style (minimal)
specifications into well-defined FireWorks workflows
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?
Each simulation procedure in atomate is composed of
multiple levels of detail / abstraction
25
Workflow – complete set of calculations to get a materials property
Firework – one step in the Workflow (typically one DFT calculation)
Firetask – one step in a Firework
Starting with just a crystal
structure, this workflow performs
four calculations to get an
optimized structure, optimized
charge density, and band
structure on two types of grids
(uniform and line)
Many types of simulation procedures are already
available!
26
•  band structure
•  BoltzTraP transport
•  spin-orbit coupling
•  hybrid functional calcs
•  elastic tensor
•  piezoelectric tensor
•  Raman spectra
•  NEB
•  GIBBS method
•  QH thermal expansion
•  AIMD
•  FEFF method
•  LAMMPS MD
Workflows currently available in atomate!
atomate allows you to leverage the prior efforts and
knowledge of many researchers
27
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
Workflow parameters can be customized at
multiple levels of detail
28
1.  Workflows have
various high-level
options
2. Fireworks also
have options / flags
(not shown)
3. Firetasks have
most detailed
number of options /
flags (not shown)
Example 1: “VASP input set”
controls the rules that set DFT
parameters (pseudopotentials,
cutoffs, grid densities, etc) via
pymatgen!
!
Example II: If “stability_check” is
enabled, the later parts of the
workflow are skipped if the
structure is determined unstable to
save computer time on
uninteresting structures!
“Powerups” can also add special features to a
workflow, resulting in customization
29
•  Atomate’s philosophy: base
workflow implementations
should be clean
–  they only contain the necessary
steps to perform a simulation
•  What about user preferences
like adding tags to certain
compounds or tracking the
progress of output files in the
database?
•  “Powerups” act like decorators
– they take in a workflow and
return a workflow with
modified properties
Example: “add_tags” powerup
will store user-defined tags/
metadata in the database along
with the normal calculation
information by adding
parameters to the “store result”
Firetask.!
Example: the “add_modify_incar” will
let you quickly override defaualt
parameters in the VASP INCAR file.!
You can build workflows from scratch or reuse
components to assemble workflows
Multiple workflows are built with the same components
stacked together in different ways like Legos
30
These two workflows reuse almost
all the same code between the
two!
Atomate simulation procedures: the main ideas
•  Atomate allows you to start with a material (e.g., crystal
structure) and get a full workflow of calculations
needed to compute many types of materials properties
•  Atomate tries to guess sensible defaults in terms of
calculation parameters, but you can always customize
these parameters using:
–  Workflow options
–  Firework options
–  Firetask options
–  powerups
–  going back to pymatgen library
31
Steps for theory &
simulation
32
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)!
simulation parameters
and sequence (workflow)!
job management,
execution, error recovery!
databases, plotting,
analysis, insight!
Step 3: calculation &
execution
33
calculation
&
execution
•  Once you have the material and the simulation procedure
(Workflow), you need to actually execute the workflow on your
computing resource
•  This includes tasks like:
–  submission to calculation queues
–  customization of any computing-specific parameters
•  e.g., path to VASP executable, number of CPUs to parallelize over
–  recovering from failures / job resubmission
–  coordinating jobs across computing centers
–  managing location of jobs
–  tracking the progress of jobs
•  Almost all of this is handled by FireWorks (custodian is used for
encoding job recover procedures)
Custodian
FireWorks – scientific workflow software
•  Once you have a Workflow, FireWorks
will execute it for you
•  FireWorks is an open-source scientific
workflow software
•  Materials Project, JCESR, and other
projects manage their runs with
FireWorks
–  >1 million jobs and >100 million CPU-
hours for internal projects alone
–  multiple computing clusters
•  You can write any kind of workflow
–  e.g., FireWorks is used for graphics
processing, machine learning, document
processing, and protein folding
–  #1 Google hit for “Python workflow
software”, top 5 for general scientific
workflow software
•  Detailed tutorials are available that
cover all the details
34
FireWorks allows you to write your workflow once and
execute (almost) anywhere
35
•  Execute workflows
locally or at a
supercomputing
center
•  Queue systems
supported
–  PBS
–  SGE
–  SLURM
–  IBM LoadLeveler
–  NEWT (a REST-based
API at NERSC)
–  Cobalt (Argonne LCF)
Job execution
(animation)
36
LAUNCHPAD
FW 1
FW 2
FW 3 FW 4
ROCKET LAUNCHER /
QUEUE LAUNCHER
Directory 1 Directory 2
Dashboard with status of all jobs
37
Job provenance and automatic metadata storage
38
what	machine	
what	0me	
what	directory	
	
what	was	the	output	
	
when	was	it	queued	
	
when	did	it	start	running	
	
when	was	it	completed
Detect and rerun failures
•  All kinds of failures can be detected and rerun
–  Soft failures (job quits with error code)
–  hard failures (computing center goes down)
–  human errors
39
“Dynamic workflows” let you program
intelligent, reactive workflows
40
Xiaohui can replace himself with
digital Xiaohui,
programmed into
FireWorks
Customize job priorities
•  Within workflow, or between workflows
•  Completely flexible and can be modified / updated
whenever you want
41
Track output files remotely
•  Can bring up the last few lines of each of your output
files – and can be combined with queries / filters
42
Now seems like a
good time to bring
up the last few lines
of the OUTCAR of
all failed jobs...
FireWorks workflow software: the main ideas
•  FireWorks is used to execute Workflow objects at
supercomputing centers
•  It is very well-suited to high-throughput applications
and has been used to execute millions of jobs and is
well tested
•  There are many features and advantages to using
FireWorks as your workflow manager, which has made
it one of the most popular scientific workflow software
in use today
43
An aside on “custodian”: instead of running VASP,
have custodian run VASP on your behalf
44
•  Custodian can wrap around an
executable (e.g., VASP)
–  i.e., run custodian instead of
directly running VASP
•  During execution, custodian will
monitor output files and detect
errors / problems
–  e.g., if ZPOTRF error detected,
rerun with ISYM=0
–  ever-expanding library of fixes
(currently over 30 fixes for VASP!)
•  There is more you can do with
custodian (e.g., simple
workflows), but it won’t be
covered here
•  Currently has fixes for: VASP, Q-
Chem, NWChem
Steps for theory &
simulation
45
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)!
simulation parameters
and sequence (workflow)!
job management,
execution, error recovery!
databases, plotting,
analysis, insight!
Step 4:
data analysis
46
data
analysis
•  Data analysis can involve:
i.  Creating databases of materials and their properties for easy
searching and data retrieval
ii.  Conventional scientific analyses and scientific plotting (e.g.,
generating phase diagrams, plotting band structures)
iii.  Data mining methods
•  Item (i) is largely handled by atomate
•  Item (ii) is largely handled by pymatgen
•  Item (iii) is largely handled by matminer
Atomate – builders
framework
47
“Builders” start with base
collections in a database and
create higher-level collections
that summarize information or
add metadata
What kinds of builders are available in atomate?
•  “Materials builder” – combine all calculations (“tasks”) on a
single crystal (as determined automatically by structure
matching algorithms) into a single document
–  e.g., combine data from band structure, dielectric, and elastic
constant workflows performed on the same material into a single
report on that material that contains all information
•  Add new information based on algorithmic analysis to the
database, e.g. structure dimensionality or phase stability
•  Merge external information such as user-defined tags,
experimental data (from a spreadsheet), or data from
Materials Project to the computational report of a material
•  If you are using atomate without using builders, you are
missing out on one of the most useful features!
48
Data analysis – the role of atomate
•  Atomate helps organize computational data into
databases
•  However, atomate does not do scientific analysis or
plotting of the data – it can just generate and grab the
data (often as pymatgen objects)
•  Pymatgen is used to analyze the data
–  we won’t cover pymatgen capabilities in depth since it is
covered elsewhere in the workshop
49
pymatgen – examples of analyses
50
phase diagrams
Pourbaix diagrams
diffusivity from MDband structure analysis
matminer helps enable data mining studies –
currently in (open) pre-release
51
Data analysis: the main ideas
•  Atomate generates databases that summarize your
calculation results. Additionally The builders framework
generates document collections that summarize data
on a particular material from multiple calculations or
from additional information.
•  Pymatgen is a powerful software for doing conventional
scientific analyses with the computational data
•  Matminer, currently in pre-release, is software for doing
data mining style analyses of materials
52
Doing simulations programmatically can become
second nature – and very fast / automatic / scalable
53
material
generation
simulation
procedure
calculation
&
execution
data
analysis
Custodian
Results!!
researcher!
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?	
software infrastructure!
Next steps
•  This purpose of this presentation was to explain the
benefits of the software tools available to you
•  You will need to take more steps to actually gain use
out of the software
•  There are multiple resources to help you with this!
54
For general information / overview / citing
•  There are papers on the software tools for general
information (and for citation purposes)
55
Ong, S. P. et al. Python Materials
Genomics (pymatgen): A robust,
open-source python library for
materials analysis. Comput. Mater.
Sci. 68, 314–319 (2013).!
Jain, A. et al. FireWorks: a dynamic
workflow system designed for high-
throughput applications. Concurr.
Comput. Pract. Exp. 22, 5037–5059
(2015).!
Mathew, K. et al. Atomate: A high-
level interface to generate, execute,
and analyze computational
materials science workflows.
Comput. Mater. Sci. 139, 140–152
(2017).!
For installing / usage / examples
•  The online documentation is best for practical usage
–  www.pymatgen.org
–  https://materialsproject.github.io/fireworks/
–  https://materialsproject.github.io/custodian/
–  https://hackingmaterials.github.io/atomate/
–  https://hackingmaterials.github.io/matminer/
•  The online documentation includes installation,
examples, tutorials, and descriptions of how to use the
code
•  If you want to do “everything”, suggest starting with
atomate and going from there
56
For help / questions
•  Did you try your best to use the software but are having
trouble?
•  The various codebases have official channels for
getting help
–  https://groups.google.com/forum/#!forum/pymatgen
–  https://groups.google.com/forum/#!forum/fireworkflows
–  https://groups.google.com/forum/#!forum/atomate
–  https://groups.google.com/forum/#!forum/matminer
–  for custodian, use Github “issues”
•  Some of these lists have resolved >100 user questions,
all essentially on volunteer time!
57
Questions? Comments?
Thanks to everyone that contributed to this software and
all those who helped support / justify its development
through citation!
58
1 of 58

Recommended

Automating materials science workflows with pymatgen, FireWorks, and atomate by
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
471 views55 slides
Density functional theory calculations and data mining for new thermoelectric... by
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Anubhav Jain
868 views28 slides
Software tools, crystal descriptors, and machine learning applied to material... by
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
361 views67 slides
Discovering new functional materials for clean energy and beyond using high-t... by
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
41 views56 slides
Computational materials design with high-throughput and machine learning methods by
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
911 views49 slides
Available methods for predicting materials synthesizability using computation... by
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
41 views43 slides

More Related Content

What's hot

Phonons & Phonopy: Pro Tips (2014) by
Phonons & Phonopy: Pro Tips (2014)Phonons & Phonopy: Pro Tips (2014)
Phonons & Phonopy: Pro Tips (2014)Jonathan Skelton
5.4K views48 slides
Graphs, Environments, and Machine Learning for Materials Science by
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
313 views29 slides
Machine Learning for Molecules by
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for MoleculesIchigaku Takigawa
231 views50 slides
Open Source Tools for Materials Informatics by
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
365 views37 slides
VASP And Wannier90: A Quick Tutorial by
VASP And Wannier90: A Quick TutorialVASP And Wannier90: A Quick Tutorial
VASP And Wannier90: A Quick TutorialJonathan Skelton
30.9K views22 slides
Density functional theory (DFT) and the concepts of the augmented-plane-wave ... by
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...ABDERRAHMANE REGGAD
12K views90 slides

What's hot(20)

Phonons & Phonopy: Pro Tips (2014) by Jonathan Skelton
Phonons & Phonopy: Pro Tips (2014)Phonons & Phonopy: Pro Tips (2014)
Phonons & Phonopy: Pro Tips (2014)
Jonathan Skelton5.4K views
Graphs, Environments, and Machine Learning for Materials Science by aimsnist
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
aimsnist313 views
Open Source Tools for Materials Informatics by Anubhav Jain
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
Anubhav Jain365 views
VASP And Wannier90: A Quick Tutorial by Jonathan Skelton
VASP And Wannier90: A Quick TutorialVASP And Wannier90: A Quick Tutorial
VASP And Wannier90: A Quick Tutorial
Jonathan Skelton30.9K views
Density functional theory (DFT) and the concepts of the augmented-plane-wave ... by ABDERRAHMANE REGGAD
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Density functional theory (DFT) and the concepts of the augmented-plane-wave ...
Combining density functional theory calculations, supercomputing, and data-dr... by Anubhav Jain
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
Anubhav Jain1.8K views
Methods available in WIEN2k for the treatment of exchange and correlation ef... by ABDERRAHMANE REGGAD
Methods available in WIEN2k for the treatment  of exchange and correlation ef...Methods available in WIEN2k for the treatment  of exchange and correlation ef...
Methods available in WIEN2k for the treatment of exchange and correlation ef...
ABDERRAHMANE REGGAD6.3K views
Machine learning for materials design: opportunities, challenges, and methods by Anubhav Jain
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
Anubhav Jain1.6K views
汎用ニューラルネットワークポテンシャル「PFP」による材料探索_MRS-J2021招待講演_2021/12/15 by Matlantis
汎用ニューラルネットワークポテンシャル「PFP」による材料探索_MRS-J2021招待講演_2021/12/15汎用ニューラルネットワークポテンシャル「PFP」による材料探索_MRS-J2021招待講演_2021/12/15
汎用ニューラルネットワークポテンシャル「PFP」による材料探索_MRS-J2021招待講演_2021/12/15
Matlantis 1.1K views
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー by Matlantis
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナーPFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
Matlantis 1K views
Introduction to Electron Correlation by Albert DeFusco
Introduction to Electron CorrelationIntroduction to Electron Correlation
Introduction to Electron Correlation
Albert DeFusco3.5K views
汎用なNeural Network Potential「Matlantis」を使った新素材探索_浅野_JACI先端化学・材料技術部会 高選択性反応分科会主... by Matlantis
汎用なNeural Network Potential「Matlantis」を使った新素材探索_浅野_JACI先端化学・材料技術部会 高選択性反応分科会主...汎用なNeural Network Potential「Matlantis」を使った新素材探索_浅野_JACI先端化学・材料技術部会 高選択性反応分科会主...
汎用なNeural Network Potential「Matlantis」を使った新素材探索_浅野_JACI先端化学・材料技術部会 高選択性反応分科会主...
Matlantis 554 views
BIOS 203 Lecture 3: Classical molecular dynamics by bios203
BIOS 203 Lecture 3: Classical molecular dynamicsBIOS 203 Lecture 3: Classical molecular dynamics
BIOS 203 Lecture 3: Classical molecular dynamics
bios2034.6K views

Similar to Software tools for calculating materials properties in high-throughput (pymatgen, atomate, FireWorks)

Open-source tools for generating and analyzing large materials data sets by
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsAnubhav Jain
489 views26 slides
Atomate: a tool for rapid high-throughput computing and materials discovery by
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
480 views59 slides
The Materials Project: Experiences from running a million computational scien... by
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
558 views26 slides
Materials Project computation and database infrastructure by
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
479 views37 slides
Atomate: a high-level interface to generate, execute, and analyze computation... by
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
382 views36 slides
Overview of accelerated materials design efforts in the Hacking Materials res... by
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
325 views55 slides

Similar to Software tools for calculating materials properties in high-throughput (pymatgen, atomate, FireWorks)(20)

Open-source tools for generating and analyzing large materials data sets by Anubhav Jain
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
Anubhav Jain489 views
Atomate: a tool for rapid high-throughput computing and materials discovery by Anubhav Jain
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
Anubhav Jain480 views
The Materials Project: Experiences from running a million computational scien... by Anubhav Jain
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
Anubhav Jain558 views
Materials Project computation and database infrastructure by Anubhav Jain
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Anubhav Jain479 views
Atomate: a high-level interface to generate, execute, and analyze computation... by Anubhav Jain
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain382 views
Overview of accelerated materials design efforts in the Hacking Materials res... by Anubhav Jain
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
Anubhav Jain325 views
Software tools, crystal descriptors, and machine learning applied to material... by Anubhav Jain
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain478 views
Software tools for high-throughput materials data generation and data mining by Anubhav Jain
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
Anubhav Jain380 views
2014 11-13-sbsm032-reproducible research by Yannick Wurm
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
Yannick Wurm508 views
The Materials Project: overview and infrastructure by Anubhav Jain
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
Anubhav Jain12.3K views
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences by Ian Foster
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Ian Foster1.3K views
2014 nicta-reproducibility by c.titus.brown
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
c.titus.brown1.9K views
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications by Anubhav Jain
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Anubhav Jain201 views
The Materials Project: A Community Data Resource for Accelerating New Materia... by Anubhav Jain
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
Anubhav Jain147 views
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications by aimsnist
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist192 views
2015 10-7-11am-reproducible research by Yannick Wurm
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
Yannick Wurm707 views
2D/3D Materials screening and genetic algorithm with ML model by aimsnist
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
aimsnist358 views
Going Smart and Deep on Materials at ALCF by Ian Foster
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
Ian Foster1.7K views
Scaling up genomic analysis with ADAM by fnothaft
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
fnothaft1.5K views
Solar System Processing with LSST: A Status Update by Mario Juric
Solar System Processing with LSST: A Status UpdateSolar System Processing with LSST: A Status Update
Solar System Processing with LSST: A Status Update
Mario Juric474 views

More from Anubhav Jain

Applications of Large Language Models in Materials Discovery and Design by
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
4 views17 slides
An AI-driven closed-loop facility for materials synthesis by
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
11 views15 slides
Best practices for DuraMat software dissemination by
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
12 views30 slides
Best practices for DuraMat software dissemination by
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
8 views21 slides
Efficient methods for accurately calculating thermoelectric properties – elec... by
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
84 views31 slides
Natural Language Processing for Data Extraction and Synthesizability Predicti... by
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
139 views28 slides

More from Anubhav Jain(20)

Applications of Large Language Models in Materials Discovery and Design by Anubhav Jain
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain4 views
An AI-driven closed-loop facility for materials synthesis by Anubhav Jain
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
Anubhav Jain11 views
Best practices for DuraMat software dissemination by Anubhav Jain
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain12 views
Best practices for DuraMat software dissemination by Anubhav Jain
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain8 views
Efficient methods for accurately calculating thermoelectric properties – elec... by Anubhav Jain
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
Anubhav Jain84 views
Natural Language Processing for Data Extraction and Synthesizability Predicti... by Anubhav Jain
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Anubhav Jain139 views
Machine Learning for Catalyst Design by Anubhav Jain
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
Anubhav Jain30 views
Natural language processing for extracting synthesis recipes and applications... by Anubhav Jain
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
Anubhav Jain95 views
Accelerating New Materials Design with Supercomputing and Machine Learning by Anubhav Jain
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
Anubhav Jain42 views
DuraMat CO1 Central Data Resource: How it started, how it’s going … by Anubhav Jain
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
Anubhav Jain31 views
Evaluating Chemical Composition and Crystal Structure Representations using t... by Anubhav Jain
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
Anubhav Jain37 views
Perspectives on chemical composition and crystal structure representations fr... by Anubhav Jain
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
Anubhav Jain153 views
Discovering and Exploring New Materials through the Materials Project by Anubhav Jain
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
Anubhav Jain133 views
The Materials Project: Applications to energy storage and functional materia... by Anubhav Jain
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
Anubhav Jain107 views
Machine Learning Platform for Catalyst Design by Anubhav Jain
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
Anubhav Jain139 views
Applications of Natural Language Processing to Materials Design by Anubhav Jain
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
Anubhav Jain154 views
Assessing Factors Underpinning PV Degradation through Data Analysis by Anubhav Jain
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
Anubhav Jain102 views
The Status of ML Algorithms for Structure-property Relationships Using Matb... by Anubhav Jain
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain87 views
Progress Towards Leveraging Natural Language Processing for Collecting Experi... by Anubhav Jain
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Anubhav Jain142 views

Recently uploaded

application of genetic engineering 2.pptx by
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptxSankSurezz
9 views12 slides
plasmids by
plasmidsplasmids
plasmidsscribddarkened352
9 views2 slides
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptxabhinashsahoo2001
126 views22 slides
PRINCIPLES-OF ASSESSMENT by
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENTrbalmagro
12 views12 slides
DATABASE MANAGEMENT SYSTEM by
DATABASE MANAGEMENT SYSTEMDATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEMDr. GOPINATH D
7 views50 slides
scopus cited journals.pdf by
scopus cited journals.pdfscopus cited journals.pdf
scopus cited journals.pdfKSAravindSrivastava
7 views15 slides

Recently uploaded(20)

application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz9 views
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by abhinashsahoo2001
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptx
abhinashsahoo2001126 views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro12 views
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx by MN
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptxENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
MN7 views
A training, certification and marketing scheme for informal dairy vendors in ... by ILRI
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
ILRI13 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... by GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN22 views
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by KerryNuez1
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
KerryNuez124 views
"How can I develop my learning path in bioinformatics? by Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy23 views
Nitrosamine & NDSRI.pptx by NileshBonde4
Nitrosamine & NDSRI.pptxNitrosamine & NDSRI.pptx
Nitrosamine & NDSRI.pptx
NileshBonde413 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles808 views
RemeOs science and clinical evidence by PetrusViitanen1
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidence
PetrusViitanen136 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens473 views

Software tools for calculating materials properties in high-throughput (pymatgen, atomate, FireWorks)

  • 1. Software tools for calculating materials properties in high-throughput (pymatgen, atomate, FireWorks) Anubhav Jain Energy Technologies Area Lawrence Berkeley National Lab Berkeley, CA MP workshop 2017 Slides (already) posted to: http://www.slideshare.net/anubhavster
  • 2. 2 “Civilization advances by extending the number of important operations which we can perform without thinking about them.” - Alfred North Whitehead
  • 3. A “black-box” view of performing a calculation 3 “something”! Results!! researcher! What is the GGA-PBE elas0c tensor of GaAs?
  • 4. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 4 lots of tedious, low-level work…! Results!! researcher! What is the GGA-PBE elas0c tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF?
  • 5. Calculations are labor intensive! •  Some of the things to do include: –  set up the structure coordinates –  write input files, double-check all the flags –  copy to supercomputer –  submit job to queue –  deal with supercomputer headaches –  monitor job –  fix error jobs, resubmit to queue, wait again –  repeat process for subsequent calculations in workflow –  parse output files to obtain results –  copy and organize results, e.g., into Excel •  This is often repeated for each and every run! 5
  • 6. All the low-level steps can lead to errors! Let’s take a look at two alternate universes: It is too easy to end up in the wrong timeline! 6 you! have coffee! copy files from! previous simulation! edit 5 lines! run simulation,! analyze data! forget coffee! copy files from! previous simulation! edit 4 lines! but forget! LHFCALC=F! run simulation, ! looks fine at first, ! in a month you ! discover it was wrong! 1 2 you!
  • 7. Today, it is difficult to learn and apply several computational procedures due to steep learning curve Because of the multiple low-level steps that all need to be done correctly to apply computational methods, there is often a single group “expert” for each technique 7 “Alice knows how to do charged defect calculations.”! “Bob is the one who can properly converge GW runs.”! “Olga has all the scripts for phonon calculations.”!
  • 8. What would be a better way? 8 “something”! Results!! researcher! What is the GGA-PBE elas0c tensor of GaAs?
  • 9. What would be a better way? 9 Results!! researcher! What is the GGA-PBE elas0c tensor of GaAs? a button!
  • 10. MPComplete on Materials Project works this way 10 Input generation (parameter choice) Workflow mapping Supercomputer submission / monitoring Error handling File Transfer File Parsing / DB insertion Custom material Submit! www.materialsproject.org “Crystal Toolkit” Anyone can find, edit, and submit (suggest) structures Currently, this feature is available for: •  structure optimization •  band structures •  elastic tensors
  • 11. What if you want to do something more complicated? 11 •  Start with all the known binary oxides •  Substitute oxygen with sulfur •  Re-optimize the structure and compute the band structure •  Plot the band gap of oxide vs the sulfide •  Rerun compounds with heavy elements with spin-orbit coupling enabled Now, a simple button won’t do it anymore. What you want is a high-level language for defining, running, and executing simulations. With such a capability, one could do many applications.
  • 12. Example: The Materials Project database 12 Jain*, Ong*, Hautier, Chen, Richards, Dacek, Cholia, Gunter, Skinner, Ceder, and Persson, APL Mater., 2013, 1, 011002. *equal contributions! The Materials Project (http://www.materialsproject.org) free and open >30,000 registered users around the world >65,000 compounds calculated Data includes •  thermodynamic props. •  electronic band structure •  aqueous stability (E-pH) •  elasticity tensors •  piezoelectric tensors >75 million CPU-hours invested = massive scale!
  • 13. Example: The Electrolyte Genome 13 data on ~22,000 molecules (mainly geometry + IP/EA via full adiabatic calcs) Also deployed on the Materials Project web site L. Cheng, R.S. Assary, X. Qu, A. Jain, S.P. Ong, N.N. Rajput, et al., J. Phys. Chem. Lett. 6 (2015) 283–291.! ! X. Qu, A. Jain, N.N. Rajput, L. Cheng, Y. Zhang, S.P. Ong, et al., Comput. Mater. Sci. 103 (2015) 56–67.!
  • 14. Example: Crystalium (Ong / Persson) 14 http://crystalium.materialsvirtuallab.org surface energies for 142 polymorphs of 72 elements + rotatable Wulff shapes computed & maintained by the Ong group (UC San Diego) with support from Persson Group (UC Berkeley) R. Tran, Z. Xu, B. Radhakrishnan, D. Winston, W. Sun, K. A. Persson, and S. P. Ong, Sci. Data, 2016, 3, 160080.!
  • 15. Example: Rapid data generation 15 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085.!
  • 16. How can we design a simple yet powerful tool for performing simulations? •  First, we’ll go through the four high-level steps in doing a theory study •  In each step, we’ll talk about how software tools can help make you more productive and help you concentrate on things that require your brainpower, not mindless labor (e.g., manually resubmitting failed jobs) 16
  • 17. Steps for theory & simulation 17 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...)! simulation parameters and sequence (workflow)! job management, execution, error recovery! databases, plotting, analysis, insight!
  • 18. Software technologies for theory / calculation 18 (automatic materials science workflows) Custodian (calculation error recovery) (materials analysis framework) Base packages Derived packages (workflow definition & execution) These are all open-source: (materials data mining)
  • 19. Software stack for theory & simulation 19 material generation simulation procedure calculation & execution data analysis Custodian
  • 20. Steps for theory & simulation 20 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...)! simulation parameters and sequence (workflow)! job management, execution, error recovery! databases, plotting, analysis, insight!
  • 21. Step 1: materials generation 21 material generation •  In yesterday’s tutorial, we already saw how pymatgen could help generate models for: –  crystal structures –  molecules –  systems (surfaces, interfaces, etc.) •  Tools include: –  order-disorder (shown at right) –  interstitial finding –  surface / slab generation –  get structures from Materials Project •  Won’t cover materials generation in this presentation! Example: Order-disorder resolve partial or mixed occupancies into a fully ordered crystal structure (e.g., mixed oxide-fluoride site into separate oxygen/fluorine)
  • 22. Steps for theory & simulation 22 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...)! simulation parameters and sequence (workflow)! job management, execution, error recovery! databases, plotting, analysis, insight!
  • 23. Step 2: simulation procedure 23 simulation procedure •  Simulation procedure requires knowing: i.  what sequence of calculations do I need to perform to compute my desired materials property? ii.  for each calculation, what is a good set of parameters to use (e.g., functional, energy cutoff, k-point density, etc.) •  Item (i) is largely handled by atomate •  Item (ii) is largely handled by pymatgen, e.g. in the VaspInputSet classes and won’t be covered here – although atomate also contains some of this information •  Overall, we are trying to standardize and automate simulation procedures as much as possible
  • 24. Atomate knows the sequence of calculations needed to compute many kinds of materials properties 24 quickly and automatically translate PI-style (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elas0c tensor of GaAs?
  • 25. Each simulation procedure in atomate is composed of multiple levels of detail / abstraction 25 Workflow – complete set of calculations to get a materials property Firework – one step in the Workflow (typically one DFT calculation) Firetask – one step in a Firework Starting with just a crystal structure, this workflow performs four calculations to get an optimized structure, optimized charge density, and band structure on two types of grids (uniform and line)
  • 26. Many types of simulation procedures are already available! 26 •  band structure •  BoltzTraP transport •  spin-orbit coupling •  hybrid functional calcs •  elastic tensor •  piezoelectric tensor •  Raman spectra •  NEB •  GIBBS method •  QH thermal expansion •  AIMD •  FEFF method •  LAMMPS MD Workflows currently available in atomate!
  • 27. atomate allows you to leverage the prior efforts and knowledge of many researchers 27 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  • 28. Workflow parameters can be customized at multiple levels of detail 28 1.  Workflows have various high-level options 2. Fireworks also have options / flags (not shown) 3. Firetasks have most detailed number of options / flags (not shown) Example 1: “VASP input set” controls the rules that set DFT parameters (pseudopotentials, cutoffs, grid densities, etc) via pymatgen! ! Example II: If “stability_check” is enabled, the later parts of the workflow are skipped if the structure is determined unstable to save computer time on uninteresting structures!
  • 29. “Powerups” can also add special features to a workflow, resulting in customization 29 •  Atomate’s philosophy: base workflow implementations should be clean –  they only contain the necessary steps to perform a simulation •  What about user preferences like adding tags to certain compounds or tracking the progress of output files in the database? •  “Powerups” act like decorators – they take in a workflow and return a workflow with modified properties Example: “add_tags” powerup will store user-defined tags/ metadata in the database along with the normal calculation information by adding parameters to the “store result” Firetask.! Example: the “add_modify_incar” will let you quickly override defaualt parameters in the VASP INCAR file.!
  • 30. You can build workflows from scratch or reuse components to assemble workflows Multiple workflows are built with the same components stacked together in different ways like Legos 30 These two workflows reuse almost all the same code between the two!
  • 31. Atomate simulation procedures: the main ideas •  Atomate allows you to start with a material (e.g., crystal structure) and get a full workflow of calculations needed to compute many types of materials properties •  Atomate tries to guess sensible defaults in terms of calculation parameters, but you can always customize these parameters using: –  Workflow options –  Firework options –  Firetask options –  powerups –  going back to pymatgen library 31
  • 32. Steps for theory & simulation 32 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...)! simulation parameters and sequence (workflow)! job management, execution, error recovery! databases, plotting, analysis, insight!
  • 33. Step 3: calculation & execution 33 calculation & execution •  Once you have the material and the simulation procedure (Workflow), you need to actually execute the workflow on your computing resource •  This includes tasks like: –  submission to calculation queues –  customization of any computing-specific parameters •  e.g., path to VASP executable, number of CPUs to parallelize over –  recovering from failures / job resubmission –  coordinating jobs across computing centers –  managing location of jobs –  tracking the progress of jobs •  Almost all of this is handled by FireWorks (custodian is used for encoding job recover procedures) Custodian
  • 34. FireWorks – scientific workflow software •  Once you have a Workflow, FireWorks will execute it for you •  FireWorks is an open-source scientific workflow software •  Materials Project, JCESR, and other projects manage their runs with FireWorks –  >1 million jobs and >100 million CPU- hours for internal projects alone –  multiple computing clusters •  You can write any kind of workflow –  e.g., FireWorks is used for graphics processing, machine learning, document processing, and protein folding –  #1 Google hit for “Python workflow software”, top 5 for general scientific workflow software •  Detailed tutorials are available that cover all the details 34
  • 35. FireWorks allows you to write your workflow once and execute (almost) anywhere 35 •  Execute workflows locally or at a supercomputing center •  Queue systems supported –  PBS –  SGE –  SLURM –  IBM LoadLeveler –  NEWT (a REST-based API at NERSC) –  Cobalt (Argonne LCF)
  • 36. Job execution (animation) 36 LAUNCHPAD FW 1 FW 2 FW 3 FW 4 ROCKET LAUNCHER / QUEUE LAUNCHER Directory 1 Directory 2
  • 37. Dashboard with status of all jobs 37
  • 38. Job provenance and automatic metadata storage 38 what machine what 0me what directory what was the output when was it queued when did it start running when was it completed
  • 39. Detect and rerun failures •  All kinds of failures can be detected and rerun –  Soft failures (job quits with error code) –  hard failures (computing center goes down) –  human errors 39
  • 40. “Dynamic workflows” let you program intelligent, reactive workflows 40 Xiaohui can replace himself with digital Xiaohui, programmed into FireWorks
  • 41. Customize job priorities •  Within workflow, or between workflows •  Completely flexible and can be modified / updated whenever you want 41
  • 42. Track output files remotely •  Can bring up the last few lines of each of your output files – and can be combined with queries / filters 42 Now seems like a good time to bring up the last few lines of the OUTCAR of all failed jobs...
  • 43. FireWorks workflow software: the main ideas •  FireWorks is used to execute Workflow objects at supercomputing centers •  It is very well-suited to high-throughput applications and has been used to execute millions of jobs and is well tested •  There are many features and advantages to using FireWorks as your workflow manager, which has made it one of the most popular scientific workflow software in use today 43
  • 44. An aside on “custodian”: instead of running VASP, have custodian run VASP on your behalf 44 •  Custodian can wrap around an executable (e.g., VASP) –  i.e., run custodian instead of directly running VASP •  During execution, custodian will monitor output files and detect errors / problems –  e.g., if ZPOTRF error detected, rerun with ISYM=0 –  ever-expanding library of fixes (currently over 30 fixes for VASP!) •  There is more you can do with custodian (e.g., simple workflows), but it won’t be covered here •  Currently has fixes for: VASP, Q- Chem, NWChem
  • 45. Steps for theory & simulation 45 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...)! simulation parameters and sequence (workflow)! job management, execution, error recovery! databases, plotting, analysis, insight!
  • 46. Step 4: data analysis 46 data analysis •  Data analysis can involve: i.  Creating databases of materials and their properties for easy searching and data retrieval ii.  Conventional scientific analyses and scientific plotting (e.g., generating phase diagrams, plotting band structures) iii.  Data mining methods •  Item (i) is largely handled by atomate •  Item (ii) is largely handled by pymatgen •  Item (iii) is largely handled by matminer
  • 47. Atomate – builders framework 47 “Builders” start with base collections in a database and create higher-level collections that summarize information or add metadata
  • 48. What kinds of builders are available in atomate? •  “Materials builder” – combine all calculations (“tasks”) on a single crystal (as determined automatically by structure matching algorithms) into a single document –  e.g., combine data from band structure, dielectric, and elastic constant workflows performed on the same material into a single report on that material that contains all information •  Add new information based on algorithmic analysis to the database, e.g. structure dimensionality or phase stability •  Merge external information such as user-defined tags, experimental data (from a spreadsheet), or data from Materials Project to the computational report of a material •  If you are using atomate without using builders, you are missing out on one of the most useful features! 48
  • 49. Data analysis – the role of atomate •  Atomate helps organize computational data into databases •  However, atomate does not do scientific analysis or plotting of the data – it can just generate and grab the data (often as pymatgen objects) •  Pymatgen is used to analyze the data –  we won’t cover pymatgen capabilities in depth since it is covered elsewhere in the workshop 49
  • 50. pymatgen – examples of analyses 50 phase diagrams Pourbaix diagrams diffusivity from MDband structure analysis
  • 51. matminer helps enable data mining studies – currently in (open) pre-release 51
  • 52. Data analysis: the main ideas •  Atomate generates databases that summarize your calculation results. Additionally The builders framework generates document collections that summarize data on a particular material from multiple calculations or from additional information. •  Pymatgen is a powerful software for doing conventional scientific analyses with the computational data •  Matminer, currently in pre-release, is software for doing data mining style analyses of materials 52
  • 53. Doing simulations programmatically can become second nature – and very fast / automatic / scalable 53 material generation simulation procedure calculation & execution data analysis Custodian Results!! researcher! What is the GGA-PBE elas0c tensor of GaAs? software infrastructure!
  • 54. Next steps •  This purpose of this presentation was to explain the benefits of the software tools available to you •  You will need to take more steps to actually gain use out of the software •  There are multiple resources to help you with this! 54
  • 55. For general information / overview / citing •  There are papers on the software tools for general information (and for citation purposes) 55 Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).! Jain, A. et al. FireWorks: a dynamic workflow system designed for high- throughput applications. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015).! Mathew, K. et al. Atomate: A high- level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017).!
  • 56. For installing / usage / examples •  The online documentation is best for practical usage –  www.pymatgen.org –  https://materialsproject.github.io/fireworks/ –  https://materialsproject.github.io/custodian/ –  https://hackingmaterials.github.io/atomate/ –  https://hackingmaterials.github.io/matminer/ •  The online documentation includes installation, examples, tutorials, and descriptions of how to use the code •  If you want to do “everything”, suggest starting with atomate and going from there 56
  • 57. For help / questions •  Did you try your best to use the software but are having trouble? •  The various codebases have official channels for getting help –  https://groups.google.com/forum/#!forum/pymatgen –  https://groups.google.com/forum/#!forum/fireworkflows –  https://groups.google.com/forum/#!forum/atomate –  https://groups.google.com/forum/#!forum/matminer –  for custodian, use Github “issues” •  Some of these lists have resolved >100 user questions, all essentially on volunteer time! 57
  • 58. Questions? Comments? Thanks to everyone that contributed to this software and all those who helped support / justify its development through citation! 58