Modern electronic structure codes give relatively consistent equations of state. There remain challenges to fully automating electronic structure calculations, such as developing robust materials analysis software to integrate calculations, detecting and correcting errors, and managing scientific workflows. Frameworks like pymatgen, ASE, the Materials Project, AiiDA and Custodian provide modular, reusable tools for high-throughput electronic structure computations and extensive materials analysis capabilities. FireWorks serves as a workflow manager to automate calculations over diverse supercomputing resources. With automation comes large quantities of materials data that can be leveraged for materials design and discovery.
Vision and reflection on Mining Software Repositories research in 2024
Â
Creating It from Bit - Designing Materials by Integrating Quantum Mechanics, Informatics and Computer Science
1. materiaIs
virtuaLab
Creating It from Bit
Designing Materials by Integrating
Quantum Mechanics,Informatics and
Computer Science
Shyue Ping Ong
February 23, 2017
57th Sanibel Symposium
2. Electronic structure calculations are today
reliable and reasonably accurate.
February 23, 2017
tials in Quantum ESPRESSO). In this case, too,
the small D values indicate a good agreement
between codes. This agreementmoreoverencom-
passes varying degrees of numerical convergence,
differences in the numerical implementation of
the particular potentials, and computational dif-
ferences beyond the pseudization scheme, most
of which are expected to be of the same order of
magnitude or smaller than the differences among
all-electron codes (1 meV per atom at most).
Conclusions and outlook
Solid-state DFT codes have evolved considerably.
The change from small and personalized codes to
widespread general-purpose packages has pushed
developers to aim for the best possible precision.
Whereas past DFT-PBE literature on the lattice
parameter of silicon indicated a spread of 0.05 Ă ,
the most recent versions of the implementations
discussed here agree on this value within 0.01 Ă
(Fig. 1 and tables S3 to S42). By comparing codes
on a more detailed level using the D gauge, we
have found the most recent methods to yield
nearly indistinguishable EOS, with the associ-
ated error bar comparable to that between dif-
ferent high-precision experiments. This underpins
thevalidityof recentDFTEOSresults andconfirms
that correctly converged calculations yield reliable
predictions. The implications are moreover rele-
vant throughout the multidisciplinary set of fields
that build upon DFT results, ranging from the
physical to the biological sciences.
In spite of the absence of one absolute refer-
ence code, we were able to improve and demon-
strate the reproducibility of DFT results by means
of a pairwise comparison of a wide range of codes
and methods. It is now possible to verify whether
any newly developed methodology can reach the
same precision described here, and new DFT
applications can be shown to have used a meth-
od and/or potentials that were screened in this
way. The data generated in this study serve as a
crucial enabler for such a reproducibility-driven
paradigm shift, and future updates of available
D values will be presented at http://molmod.
ugent.be/deltacodesdft. The reproducibility of
reported results also provides a sound basis for
further improvement to the accuracy of DFT,
particularly in the investigation of new DFT func-
tionals, or for the development of new computa-
tional approaches. This work might therefore
Fig. 4. D values for comparisons between the most important DFT methods considered (in
millielectron volts per atom). Shown are comparisons of all-electron (AE), PAW, ultrasoft (USPP), and
norm-conserving pseudopotential (NCPP) results with all-electron results (methods are listed in alpha-
betical order in each category). The labels for each method stand for code, code/specification (AE), or
potential set/code (PAW, USPP, and NCPP) and are explained in full in tables S3 to S42.The color coding
RESEARCH | RESEARCH ARTICLE
onFebruary19,2017http://science.sciencemag.org/Downloadedfrom
Lejaeghere et al. Science, 2016, 351 (6280), aad3000.
Nitrides are an important class of optoel
ported synthesizability of highly metasta
nitrogen precursors (36, 37) suggests th
spectrum of promising and technologica
trides awaiting discovery.
Although our study focuses on the m
crystals, polymorphism and metastability
is of great technological relevance to pha
tronics, and protein folding (7). Our obs
energy to metastability could address a d
in organic molecular solids: Why do man
numerous polymorphs within a small (~
whereas inorganic solids often see >100°C
morph transition temperatures? The wea
molecular solids yield cohesive energies o
or â1 eV per molecule, about a third of t
class of inorganic solids (iodides; Fig. 2B).
yields a correspondingly small energy scal
(38). When this small energy scale of orga
is coupled with the rich structural diversity a
tional degrees of freedom during molecular
leads to a wide range of accessible polymorp
modynamic conditions.
Influence of composition
The space of metastable compounds hov
scape of equilibrium phases. As chemica
thermodynamic system, the complexity
grows. Figure 2A shows an example ca
for the ternary Fe-Al-O system, plotted a
tion energies referenced to the elemental
S1.2 for discussion). We anticipate the th
of a phase to be different when it is compe
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
HAUTIER, ONG, JAIN, MOORE, AND CEDER PHYSICAL REVIEW B 85, 155208 (2012)
or meV/atom); 10 meV/atom corresponds to about 1 kJ/mol-
atom.
III. RESULTS
Figure 2 plots the experimental reaction energies as a
function of the computed reaction energies. All reactions
involve binary oxides to ternary oxides and have been chosen
as presented in Sec. II. The error bars indicate the experimental
error on the reaction energy. The data points follow roughly
the diagonal and no computed reaction energy deviates from
the experimental data by more than 150 meV/atom. Figure 2
does not show any systematic increase in the DFT error with
larger reaction energies. This justiďŹes our focus in this study
on absolute and not relative errors.
In Fig. 3, we plot a histogram of the difference between
the DFT and experimental reaction energies. GGA + U un-
derestimates and overestimates the energy of reaction with the
same frequency, and the mean difference between computed
and experimental energies is 9.6 meV/atom. The root-mean-
square (rms) deviation of the computed energies with respect
to experiments is 34.4 meV/atom. Both the mean and rms are
very different from the results obtained by Lany on reaction
energies from the elements.52
Using pure GGA, Lany found
that elemental formation energies are underestimated by GGA
with a much larger rms of 240 meV/atom. Our results are
closer to experiments because of the greater accuracy of DFT
when comparing chemically similar compounds such as binary
and ternary oxides due to errors cancellation.40
We should note
that even using elemental energies that are ďŹtted to minimize
the error versus experiment in a large set of reactions, Lany
reports that the error is still 70 meV/atom and much larger
than what we ďŹnd for the relevant reaction energies. The
rms we found is consistent with the error of 3 kJ/mol-atom
600
800
l
V/at)
FIG. 3. (Color online) Histogram of the difference between
computed ( E
comp
0 K ) and experimental ( E
expt
0 K ) energies of reaction
(in meV/atom).
(30 meV/atom) for reaction energies from the binaries in the
limited set of perovskites reported by Martinez et al.29
Very often, instead of the exact reaction energy, one is
interested in knowing if a ternary compound is stable enough
to form with respect to the binaries. This is typically the case
when a new ternary oxide phase is proposed and tested for
stability versus the competing binary phases.18
From the 131
compounds for which reaction energies are negative according
to experiments, all but two (Al2SiO5 and CeAlO3) are also
negative according to computations. This success in predicting
stability versus binary oxides of known ternary oxides can
be related to the very large magnitude of reaction energies
from binary to ternary oxides compared to the typical errors
observed (rms of 34 meV/atom). Indeed, for the vast majority
of the reactions (109 among 131), the experimental reaction en-
ergies are larger than 50 meV/atom. It is unlikely then that the
DFT error would be large enough to offset this large reaction
energy and make a stable compound unstable versus the binary
oxides.
The histogram in Fig. 3 shows several reaction energies
with signiďŹcant errors. Failures and successes of DFT are often
JSON document in the format of a Crystallographic Information File (cif), which can also be downloaded
via the Materials Project website and Crystalium web application. In addition, the weighted surface
energy (equation (2)), shape factor (equation (3)), and surface anisotropy (equation (4)) are given.
Table 2 provides a full description of all properties available in each entry as well as their corresponding
JSON key.
Technical Validation
The data was validated through an extensive comparison with surface energies from experiments and
other DFT studies in the literature. Due to limitations in the available literature, only the data on ground
state phases were compared.
Comparison to experimental measurements
Experimental determination of surface energy typically involves measuring the liquid surface tension and
solid-liquid interfacial energy of the material20
to estimate the solid surface energy at the melting
temperature, which is then extrapolated to 0 K under isotropic approximations. Surface energies for
individual crystal facets are rarely available experimentally. Figure 5 compares the weighted surface
energies of all crystals (equation (2)) to experimental values in the literature20,23,26â28
. It should be noted
that we have adopted the latest experimental values available for comparison, i.e., values were obtained
from the 2016 review by Mills et al.27
, followed by Keene28
, and ďŹnally Niessen et al.26
and Miller and
Tyson20
. A one-factor linear regression line ÎłDFT
Âź ÎłEXP
Ăž c was ďŹtted for the data points. The choice of
the one factor ďŹt is motivated by the fact that standard broken bond models show that there is a direct
relationship between surface energies and cohesive energies, and previous studies have found no evidence
that DFT errors in the cohesive energy scale with the magnitude of the cohesive energy itself61
.
We ďŹnd that the DFT weighted surface energies are in excellent agreement with experimental values,
with an average underestimation of only 0.01 J mâ 2
and a standard error of the estimate (SEE) of
0.27 J mâ 2
. The Pearson correlation coefďŹcient r is 0.966. Crystals with surfaces that are well-known to
undergo signiďŹcant reconstruction tend to have errors in weighted surface energies that are larger than
the SEE.
The differences between the calculated and experimental surface energies can be attributed to three
main factors. First, there are uncertainties in the experimental surface energies. The experimental values
derived by Miller and Tyson20
are extrapolations from extreme temperatures beyond the melting point.
The surface energy of Ge, Si62
, Te63
, and Se64
were determined at 77, 77, 432 and 313 K respectively while
Figure 5. Comparison to experimental surface energies. Plot of experimental versus calculated weighted
surface energies for ground-state elemental crystals. Structures known to reconstruct have blue data points
while square data points correspond to non-metals. Points that are within the standard error of the estimate
â 2
Phase stability Formation energies
Tran, et al. Sci. Data 2016, 3, 160080.
Sun, et al. Sci. Adv. 2016, 2 (11), e1600225.
Figure 2. Distribution of calculated volume per atom, Poisson ratio, bulk modulus and shear modulus. Vector
ďŹeld-plot showing the distribution of the bulk and shear modulus, Poisson ratio and atomic volume for 1,181
metals, compounds and non-metals. Arrows pointing at 12 oâclock correspond to minimum volume-per-atom
and move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 oâclock. Bar
plots indicate the distribution of materials in terms of their shear and bulk moduli.
www.nature.com/sdata/
Surface energies Elastic constants
de Jong et al. Sci. Data 2015, 2, 150009.
Hautier et al. Phys. Rev. B 2012, 85, 155208.
57th Sanibel Symposium
Modern
electronic
structure codes
give relatively
consistent
equations of
state.
Of course,
challenges remain
âŚ
3. February 23, 2017
⢠What are the tools necessary for automation
of electronic structure calculations?Automation
⢠What is a model for open-vs-private data?Data
⢠How and what can we learn from large
quantities of materials data?
⢠Can we really do in silico design of materials?
Learning &
Design
Reliability + Reasonable Accuracy
57th Sanibel Symposium
4. February 23, 2017
⢠What are the tools necessary for automation
of electronic structure calculations?Automation
⢠What is a model for open-vs-private data?Data
⢠How and what can we learn from large
quantities of materials data?
⢠Can we really do in silico design of materials?
Learning &
Design
Reliability + Reasonable Accuracy
57th Sanibel Symposium
6. âUser requirementsâ for electronic structure
calculation automation
February 23, 2017
ËHĎ = ËT + ËV + ËUâĄ
âŁ
â¤
âŚĎ = EĎ
???
Need #1: Robust materials analysis software to âtalkâ between application, computable
property and electronic structure calculations
Need #2: Error detection and correction
Need #3: Scientific workflow
management
57th Sanibel Symposium
7. Software frameworks for HT electronic structure
computations
February 23, 2017
Atomic Simulation Environment
https://wiki.fysik.dtu.dk/ase
Materials Project1
https://www.materialsproject.org
Custodianhttp://aflowlib.org
http://www.aiida.net
1 Jain et al. APL Mater. 2013, 1 (1), 11002.
2 Ong et al. Comput. Mater. Sci. 2013, 68, 314â319.
3 Jain et al. Concurr. Comput. Pract. Exp. 2015, 27 (17), 5037â5059.
2
3
57th Sanibel Symposium
8. Extensive Materials Analysis Capabilities
Input/O
utput
objects
(Modular, Reusable, Extendable)
Defects and TransformationsElectronic Structure
XRD
Patterns
Phase and Pourbaix Diagrams
Functional properties
Comprehensively
documented
Continuously tested
and integrated
Active dev/user community
Ong et al. Comput. Mater. Sci. 2013, 68, 314â319.
February 23, 2017 57th Sanibel Symposium
9. February 23, 2017
Global network of users
Some recent additions
⢠Dielectric constants
⢠Elastic constants and
phonons
⢠X-ray Absorption
Spectroscopy
Number of visits on pymatgen.org
Very active developer community!
57th Sanibel Symposium
10. Custodian
Simple, robust and flexible just-in-time (JIT)
job management framework.
⢠Wrappers to perform error checking, job
management and error recovery.
⢠Error recovery is an important aspect for
HT: O(100,000) jobs + 1% error rate =>
O(1000) errored jobs.
⢠Existing sub-packages for error handling
forVASP, NwChem and QChem
calculations.
February 23, 2017 57th Sanibel Symposium
11. FireWorks is the Workflow Manager
Custom material
A cool material !!
Lots of information about
cool material !!
Submit!
Input generation
(parameter choice) Workflow mapping
Supercomputer
submission /
monitoring
Error
handling File Transfer
File Parsing /
DB insertion
February 23, 2017 57th Sanibel Symposium
12. FireWorks as a platform
Community can write any
workflow in FireWorks
Ă
We can automate it over
most supercomputing
resources
structure
charge
Band
structure
DOS
Optical
phonons
XAFS
spectra
GW
February 23, 2017 57th Sanibel Symposium
13. February 23, 2017
⢠What are the tools necessary for automation
of electronic structure calculations?Automation
⢠What is a model for open-vs-private data?Data
⢠How and what can we learn from large
quantities of materials data?
⢠Can we really do in silico design of materials?
Learning &
Design
Reliability + Reasonable Accuracy
57th Sanibel Symposium
15. With great automation comes great quantities of
dataâŚ
February 23, 2017
Jain, et al. ,APL Mater., 2013, 1, 11002.
57th Sanibel Symposium
16. User-friendly web interface
(but unfriendly for advanced
users requiring large quantities
of data!)
Materials Explorer
Battery Explorer
CrystalToolkit
Structure Predictor
Phase Diagram App
Pourbaix Diagram App
Reaction Calculator
February 23, 2017 57th Sanibel Symposium
Structure
Electronic
Structure
Elastic
properties
XRD
Energetic
properties
Jain, et al. ,APL Mater., 2013, 1, 11002.
17. Materials
Project DB
How do I
access MP
data?
WebApps
RESTfulAPI
The MaterialsAPI
⢠Provides programmatic access to large quantities of data.
⢠Data can be used for analysis or for learning.
Ong et al. Comput. Mater. Sci. 2015, 97, 209â215.
February 23, 2017 57th Sanibel Symposium
18. A modern data model for high-throughput
computational research groups
February 23, 2017
Materials
Project
Large, open
electronic
structure
databases
AFLOW OQMD
APIs
Private âsmallâ databases for individual research groups
57th Sanibel Symposium
19. February 23, 2017
⢠What are the tools necessary for automation
of electronic structure calculations?Automation
⢠What is a model for open-vs-private data?Data
⢠How and what can we learn from large
quantities of materials data?
⢠Can we really do in silico design of materials?
Learning &
Design
Reliability + Reasonable Accuracy
57th Sanibel Symposium
20. Applications
February 23, 2017 57th Sanibel Symposium
⢠Rapidly explore vast chemical spaces
⢠Exclude bad candidates using a minimum of
computational resources
⢠Multi-property optimization
⢠Identify âbestâ candidate(s)
Screening
⢠Analyze large data sets
⢠Identify trends
⢠Obtain new physics/chemistry insights
Learning
21. Next GenerationAll-solid-state Batteries
February 23, 2017 57th Sanibel Symposium
NMC Li metalSolid electrolyte
~
Ί
Li+
Solid electrolyte (SE) is non-flammable
and can be stacked for higher system
energy densities.
Can potentially enable high voltage
cathodes like NMC
Can potentially enable Li metal anode,
with significantly higher energies
densities
Design Requirements for SE
q Extremely high (âsuperâ) ionic
conductivity > 0.1 mS/cm
q Low electronic conductivity
q Phase stability
q Good electrochemical stability with
electrodes (intrinsic or passivation)
q Mechanical compatibility
22. Predicting non-dilute diffusion properties
with first principles Lithium motion in Li10GeP2S12
(sulfur tetrahedra frozen for clarity)
EĎ(r) = â
h 2
2m
â2
Ď(r)+V(r)Ď(r)
Quantum mechanics
F = ma
Newtonâs laws of motion
Ab initio molecular
dynamics or AIMD
D =
1
2Ndt
ri (t +t0 )2
âri (t)2
i
â
⢠Computational and human-time intensive
⢠Huge quantities data need to be stored and analyzed
Typical AIMD simulation: 50,000-100,000 time steps of 2 fs at 4-6 temperatures
(400,000 structures, 2-3 weeks on cluster)
February 23, 2017 57th Sanibel Symposium
23. Completely automated ab initio molecular
dynamics
February 23, 2017 57th Sanibel Symposium
Dynamically
add
continuation
AIMD job
starting from
previous one
Dynamically add initial
AIMD jobs running at
different temperatures
Converged? Converged? Converged?
AIMD
simulation
AIMD
simulation
AIMD
simulation
Setup
simulation
box
Initial
relaxation
Start
End
N N N
Y Y Y
Deng, Z.; Zhu, Z.; Chu, I.-H.; Ong, S. P. Chem. Mater. 2017, 29 (1), 281â288.
In-house system built entirely
on pymatgen, custodian and
fireworks!
25. Many Li superionic conductors haveAg analogues
February 23, 2017 57th Sanibel Symposium
Li7P3S11Ag7P3S11
a
c
Yamane et al., Solid State Ionics 2007, 178 (15â18), 1163â1167.
Hautier et al., Inorg. Chem., 2010, 656â663.
Data-mined ionic substitution probabilities
pnĂ°X, X0
Ă
e
P
i
Îťi f
Ă°nĂ
i Ă°X,X0
Ă
Z
Ă°5Ă
The Îťi indicate the weight given to the feature fi
(n)
(X,
X0
) in the probabilistic model. Z is a partition function
ensuring the normalization of the probability func-
tion. The exponential form chosen in eq 5 follows a
commonly used convention in the machine learning
community.25
2.3. Binary Feature Model. A first assumption we make
is to consider that the feature functions do not depend
on the number n of ions in the compound. Simply put, we
assume that the ionic substitution rules are independent
of the compoundâs number of components (binary, ternary,
quaternary, ...).
Therefore, we will omit any reference to n in the
probability and feature functions. Equation 5 becomes
pĂ°X, X0
Ă
e
P
i
Îťi fiĂ°X,X0
Ă
Z
Ă°6Ă
While the feature functions could be more complex,
only simple binary substitutions are considered in this
paper. This means that the likelihood for two ions to
substitute to each other is independent of the nature of
the other ionic species present in the compound. Mathe-
matically, this translates in assuming that the relevant
feature functions are simple binary features of the form:
0
(
well estab
still need
from the
structure
From
structura
comparis
BaTiO3 b
and Ba o
ematical
variables
O2-
). We
designing
(X,X0
) by
ture data
(x, x0
)t
w
D Âź fĂ°X
Comin
probabili
database
The ana
substitut
Using
we follow
proach to
available
community.25
2.3. Binary Feature Model. A first assumption we make
is to consider that the feature functions do not depend
on the number n of ions in the compound. Simply put, we
assume that the ionic substitution rules are independent
of the compoundâs number of components (binary, ternary,
quaternary, ...).
Therefore, we will omit any reference to n in the
probability and feature functions. Equation 5 becomes
pĂ°X, X0
Ă
e
P
i
Îťi fiĂ°X,X0
Ă
Z
Ă°6Ă
While the feature functions could be more complex,
only simple binary substitutions are considered in this
paper. This means that the likelihood for two ions to
substitute to each other is independent of the nature of
the other ionic species present in the compound. Mathe-
matically, this translates in assuming that the relevant
feature functions are simple binary features of the form:
f
a,b
k Ă°X, X0
Ă Âź
1 Xk Âź a and Xk
0
Âź b
0 else
(
Ă°7Ă
Each pair of ions a and b present in the domain Ί is
assigned a set of feature functions with corresponding
weights Îťk
a,b
indicating how likely the ions a and b can
substitute in position k. For instance, one of the feature
2Ăž 2Ăž
26. February 23, 2017 57th Sanibel Symposium
Phase stability
Ehull 30 meV/atom
Promising
candidates
Li-P-S and
Li-M-P-S
compounds
Topological analysis
rc 1.75 Ă
Short AIMD
estimation
MSD800K 5 Ă 2
MSD1200K/MSD800K 7
Long AIMD at
multiple
temperatures
Ď300K 1 mS/cm
ICSD
DiďŹusivity screening
Initial
candidates
Ag-P-S and
Ag-M-P-S
compounds
Substitute
Ag for Li
Dopant and
composition
optimization
Li3Y(PS4)2
Li5PS4Cl2
Ehull = 2 meV/atom
rc = 1.88 âŤ
MSD800K = 65.1 âŤ2
MSD ratio = 4.5
Ehull = 17 meV/atom
rc = 1.76 âŤ
MSD800K = 77.9 âŤ2
MSD ratio = 3.1
Parent:Ag3Y(PS4)2
(ICSD 417658)
Parent: Li5PS4Cl2
(ICSD 416587)
Zhu et al. Chem. Mater. 2017, acs.chemmater.6b04049.
Provisional patent filed
A B
Ehull
A0.5B0.5
28. Li3Y(PS4)2 likely to exhibit better electrochemical
stability than current state-of-the-art
February 23, 2017 57th Sanibel Symposium
Zhu, Z.; et al., Chem. Mater., 2016,Accepted Kato, et al., Nat. Energy, 2016, 1, 16030.
29. Crystalium âWorldâs Largest Database of Surface
Energies andWulff Shapes
February 23, 2017 57th Sanibel Symposium
Generation of
OUCs up to max
Miller index
Input bulk crystal
structure
Relaxation calculation
of OUC (hkl)
Termination 1
calculation
Termination 2
calculation
âŚ
Surface calculations
(h2k2l2) âŚ
Surface calculations
(h1k1l1)
Calculations
completed?
Parameter
adjustment
No
Yes
Surface Database
Surface energy
and Wulff shape
calculations
Materials
Project
Dryad
Repository
http://crystalium.materialsvirtuallab.orgTran, R.; Xu, Z.; Radhakrishnan, B.;Winston, D.; Sun,W.; Persson, K.A.;
Ong, S. P., Sci. Data, 2016, 3, 160080.
30. Insights from Large Materials Datasets
February 23, 2017 57th Sanibel Symposium
DFT does surprisingly well in terms of surface
energies, contrary to popular perception
Trends in energies between reconstructed
and unreconstructed surfaces can be
reproduced.
Fcc(110)
âmissing rowâ
Expt. known to
reconstruct!
Tran, R.; Xu, Z.; Radhakrishnan, B.;Winston, D.; Sun,W.; Persson, K.A.;
Ong, S. P., Sci. Data, 2016, 3, 160080.
SEE = 0.27 J mâ2
31. Building software infrastructure enables new
capabilities
February 23, 2017 57th Sanibel Symposium
Tran, R.; Xu, Z.; Zhou, N.; Radhakrishnan, B.; Luo, J.; Ong, S. P.,Acta Mater.,
2016, 117, 91â99, doi:10.1016/j.actamat.2016.07.005.
Re, Os,Ta and W are
strengthening dopants
for Mo alloys.
32. Learning new insights into dopant effect on GB
strength
February 23, 2017 57th Sanibel Symposium
or the 29 dopants in the S5(310) tilt GB. (a) based on lowest energy dopant site in GB and free surface
region (positive E X
seg) prefer to stay in the bulk. For dopants that segregate, those with negative E X
SE (bl
colour in this ďŹgure legend, the reader is referred to the web version of this article.)
Classic bond-
breaking theory
gives 0.33
Strain effect due
to dopant-host
radius mismatch
34. Challenge 1:Software development is typically not
viewed asâscienceâ
Most research group software are:
a. Badly coded
b. Poorly documented
c. Not available to the community
d. Does not last beyond the current PhD/postdoc
e. All of the above
February 23, 2017 57th Sanibel Symposium
https://xkcd.com/292/
35. Challenge 2:DataAPIs are not that common
⢠Many materials data repository projects still Web 1.0.
⢠Development of API for programmatic materials data access
not part of distribution strategy.
⢠APIs need complementary software support.
February 23, 2017 57th Sanibel Symposium
The Materials Application Programming Interface (API): A simple,
ďŹexible and efďŹcient API for materials data based on REpresentational
State Transfer (REST) principles
Shyue Ping Ong a,â
, Shreyas Cholia b
, Anubhav Jain b
, Miriam Brafman b
, Dan Gunter b
, Gerbrand Ceder c
,
Kristin A. Persson b
a
Department of NanoEngineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0448, La Jolla, CA 92093, USA
b
Lawrence Berkeley National Lab, 1 Cyclotron Rd, Berkeley, CA 94720, USA
c
Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
a r t i c l e i n f o
Article history:
Received 18 August 2014
Accepted 18 October 2014
Keywords:
Materials Project
Application Programming Interface
High-throughput
Materials genome
Rest
Representational state transfer
a b s t r a c t
In this paper, we describe the Materials Application Programming Interface (API), a simple, ďŹexible and
efďŹcient interface to programmatically query and interact with the Materials Project database based on
the REpresentational State Transfer (REST) pattern for the web. Since its creation in Aug 2012, the
Materials API has been the Materials Projectâs de facto platform for data access, supporting not only
the Materials Projectâs many collaborative efforts but also enabling new applications and analyses. We
will highlight some of these analyses enabled by the Materials API, particularly those requiring
consolidation of data on a large number of materials, such as data mining of structural and property
trends, and generation of phase diagrams. We will conclude with a discussion of the role of the API in
building a community that is developing novel applications and analyses based on Materials Project data.
Ă 2014 Elsevier B.V. All rights reserved.
1. Introduction
First principles methods are today a critical tool in the study
and design of materials. Starting from the fundamental laws of
physics with minimal assumptions and approximations, ďŹrst prin-
ciples techniques can access a wide range of chemistries in a rela-
tively agnostic manner, making them especially powerful in
materials investigations or design problems spanning diverse
chemical spaces.
In the past decade, electronic structure calculation codes [1â4]
have reached a level of maturity that it is now possible to reliably
automate and scale ďŹrst principles calculations across any number
of compounds. Coupled with computing advances, this develop-
ment has led to the advent of high throughput (HT) ďŹrst principles
calculations as an investigative and design tool in materials
science. Even today, there are already several examples of HT
ďŹrst principles computation-guided materials design efforts in
hydrogen production [10], topological insulators [11], and organic
semiconductors [12], with many of these efforts resulting in the
discovery of novel materials that have already been synthesized
and veriďŹed experimentally. This HT capability has also spurred
the development of large databases of computed data on materials,
such as the Materials Project [13], the AFLOWLIB library [14] and
the Harvard Clean Energy Project [12].
In particular, the Materials Project [13], created by the authors
of this paper, has led the charge of combining a large database of
materials properties with a diverse and growing set of online anal-
ysis and comprehensive open source software tools [15â17]. The
Materials Projectâs database today contains computed energetic
properties for over 59,000 crystal structures along with over
25,000 electronic structure properties. More structures and prop-
erties (e.g., elastic constants, dielectric constants, etc.) are being
added on a daily basis. A series of web applications provide users
with the capability to perform advanced searches and common
Computational Materials Science 97 (2015) 209â215
Contents lists available at ScienceDirect
Computational Materials Science
journal homepage: www.elsevier.com/locate/commatsci
Ong et al. Comput. Mater. Sci. 2015, 97, 209â215.
A RESTful API for exchanging materials data in the AFLOWLIB.org
consortium
Richard H. Taylor a,b
, Frisco Rose b
, Cormac Toher b
, Ohad Levy b,1
, Kesong Yang c
,
Marco Buongiorno Nardelli d,e
, Stefano Curtarolo f,â
a
National Institute of Standards and Technology, Gaithersburg, MD 20878, USA
b
Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, USA
c
Department of Nanoengineering, University of California, San Diego, La Jolla, CA 92093, USA
d
Department of Physics, University of North Texas, Denton, TX, USA
e
Department of Chemistry, University of North Texas, Denton, TX, USA
f
Materials Science, Electrical Engineering, Physics and Chemistry, Duke University, Durham, NC 27708, USA
a r t i c l e i n f o
Article history:
Received 20 March 2014
Received in revised form 5 May 2014
Accepted 10 May 2014
Available online 24 July 2014
Keywords:
High-throughput
Combinatorial materials science
Computer simulations
Materials databases
AFLOWLIB
a b s t r a c t
The continued advancement of science depends on shared and reproducible data. In the ďŹeld of compu-
tational materials science and rational materials design this entails the construction of large open dat-
abases of materials properties. To this end, an Application Program Interface (API) following REST
principles is introduced for the AFLOWLIB.org materials data repositories consortium. AUIDs (AďŹowlib
Unique IDentiďŹer) and AURLs (AďŹowlib Uniform Resource Locator) are assigned to the database resources
according to a well-deďŹned protocol described herein, which enables the client to access, through appro-
priate queries, the desired data for post-processing. This introduces a new level of openness into the
AFLOWLIB repository, allowing the community to construct high-level work-ďŹows and tools exploiting
its rich data set of calculated structural, thermodynamic, and electronic properties. Furthermore, feder-
ating these tools will open the door to collaborative investigations of unprecedented scope that will dra-
matically accelerate the advancement of computational materials design and development.
Ă 2014 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://
creativecommons.org/licenses/by/3.0/).
1. Introduction
Data-driven materials science has gained considerable traction
over the last decade or so. This is due to the conďŹuence of three
key factors: (1) Improved computational methods and tools; (2)
greater computational power; and (3) heightened awareness of
the power of extensive databases in science [1]. The recent Materi-
als Genome Initiative (MGI) [1,2] reďŹects the recognition that
many important social and economic challenges of the 21st cen-
tury could be solved or mitigated by advanced materials. Compu-
tational materials science currently presents the most promising
trended toward more public and user-friendly frameworks. The
emphasis is increasingly on portability and sharing of tools and
data [13â15]. Similar to the effort presented here, the Materials-
Project [16] has been providing open access to its database of com-
puted materials properties through a RESTful API and a python
library enabling ad hoc applications [17]. Other examples of online
material properties databases include that being implemented by
the Engineering Virtual Organization for Cyber Design (EVOCD)
[18], which contains a repository of experimental data, materials
constants and computational tools for use in Integrated Computa-
tional Material Engineering (ICME). The future advance of compu-
Computational Materials Science 93 (2014) 178â192
Contents lists available at ScienceDirect
Computational Materials Science
journal homepage: www.elsevier.com/locate/commatsci
Taylor et al. Comput. Mater. Sci. 2014, 93, 178â192
Pymatgen provides high-level tools for users to easily
obtain and work with data via Materials API.
36. Challenge 3:There are still problems tooâlargeâ
for first principles
February 23, 2017 57th Sanibel Symposium
CostScale
Transferability
Transferable
Costly
Short time/length scales
Non-transferable
Cheap
Longer time/length scales
First principles EmpiricalAutomation
38. Acknowledgements
February 23, 2017 57th Sanibel Symposium
Ong group
Iek-Heng Chu, Balachandran Radhakrishnan,
Zhuoying Zhu,Yuh-Chieh Lin, Zhenbin Wang,
Zihan Xu, Zhi Deng, Hanmei Tang,WeikeYe,
Chen Zheng
Collaborators
Prof Shirley Meng, Christopher
Kompella, Han Nguyen, Sunny Hy
Prof Joanna McKittrick, Jungmin Ha
Creating It from Bit