In this talk at the CECAM 2015 Workshop on Future Technologies in Automated Atomistic Simulations, I will discuss the Materials Project Ecosystem, an initiative to develop a comprehensive set of open-source software and data tools for materials informatics. The Materials Project is a US Department of Energy-funded initiative to make the computed properties of all known inorganic materials publicly available to all materials researchers to accelerate materials innovation. Today, the Materials Project database boasts more than 58,000 materials, covering a broad range of properties, including energetic properties (e.g., phase and aqueous stability, reaction energies), electronic structure (bandstructures, DOSs) and structural and mechanical properties (e.g., elastic constants).
A linchpin of the Materials Project is its robust data and software infrastructure, built on best open-source software development practices such as continuous testing and integration, and comprehensive documentation. I will provide an overview of the open-source software modules that have been developed for materials analysis (Python Materials Genomics), error handling (Custodian) and scientific workflow management (FireWorks), as well as the Materials API, a first-of-its-kind interface for accessing materials data based on REpresentational State Transfer (REST) principles. I will show a materials researcher may use and build on these software and data tools for materials informatics as well as to accelerate his own research.
5. The Materials Project is an open science
project to make the computed properties of
all known inorganic materials publicly
available to all researchers to accelerate
materials innovation.
June 2011: Materials Genome Initiative which
aims to “fund computational tools, software, new
methods for material characterization, and the
development of open standards and databases that
will make the process of discovery and development
of advanced materials faster, less expensive, and
more predictable”
https://www.materialsproject.org
6. As of Jun 5 2015
q Over 58,000 unique
compounds, and growing
q Diverse set of many
properties
q Structural (lattice parameters,
atomic positions, etc.),
q Energetic (formation
energies, phase stability, etc.)
q Electronic structure (DOS,
Bandstructures)
q Elastic constants
q Suite of Web Apps for
materials analysis
7. User-friendly Web Apps
Materials Explorer: Search for materials by formula,
elements or properties
Battery Explorer: Search for battery materials by
voltage, capacity and other properties
Crystal Toolkit: Design new materials from existing
materials
Structure Predictor: Predict novel structures
Phase Diagram App: Generate compositional and
grand canonical phase diagrams
Pourbaix Diagram App: Generate Pourbaix
diagrams
Reaction Calculator: Balance reactions and calculate
their enthalpies
8. Materials Project data in User papers
M. Meinert, M.P. Geisler, Phase stability of chromium based
compensated ferrimagnets with inverse Heusler structure, J.
Magn. Magn. Mater. 341 (2013) 72–74.
J. Rustad, Density functional calculations of the enthalpies of
formation of rare-earth orthophosphates, Am. Mineral. 97
(2012) 791–799.
M. Fondell, T.J. Jacobsson, M. Boman, T. Edvinsson, Optical
quantum confinement in low dimensional hematite, J. Mater.
Chem. A. 2 (2014) 3352.
9. Web frontend is only the tip of the iceberg…
pymatgen
FireWorks
REST API
custodian
MPWorks
MPEnv
rubicon
10.
11. Hierarchical design of codebases
keeps infrastructure nimble to changes
WORKFLOW CODE
CHEMISTRY CODE
12. Many types of use cases
FireWorks pymatgen custodian MPWorks
Crystal workflows
FireWorks pymatgen custodian rubicon (private)
Molecule workflows
pymatgen
FireWorks
external
MAST, MaterialsHub
external
Berlin ML, JGI, MoDeNa
13. Sustainable software development
¨ Open-source
¤ Managed via
¤ More eyes => robustness
¤ Contributions from all over the world
¨ Benevolent dictators
¤ Unified vision
¤ Quality control
¨ Clear documentation
¤ Prevent code rot
¤ More users
¨ Continuous integration and testing
¤ Ensure code is always working
14. Python Materials Genomics (pymatgen)
¨ Core materials analysis powering the Materials
Project
¨ Defines core extensible Python objects for materials
data representation.
¨ Provides a robust and well-documented set of
structure and thermodynamic analysis tools relevant to
many applications.
¨ Establishes an open platform for researchers to
collaboratively develop sophisticated analyses of
materials data.
15. Extensive Materials Analysis Capabilities
Input/
Output
objects
(Modular, Reusable, Extendable)
Defects and TransformationsElectronic Structure
XRD Patterns
Phase and Pourbaix Diagrams
Functional properties
Comprehensively
documented
Continuously tested
and integrated
Active dev/user community
16. www.pymatgen.org stats
• > 6000 views per month on average
• (~50% increase from previous year)
V2.9.12 è v3.0.13
*Python 2/3 compatible!
Other improvements
• ABINIT support
• Defects (Haranczyk/LBNL)
• Qchem (JCESR)
• Bug fixes & improvements
Very active user community!
81 forks (developers making changes and contributing)
Actual commits has slowed somewhat, as expected for
a maturing and robust code base.
17. Pymatgen-db
¨ Database add-on for pymatgen. Enables the
creation of Materials Project-style MongoDB
(www.mongodb.org) databases for management of
materials data. Key features:
¤ Query engine for easy translation of MongoDB docs to
useful pymatgen objects for analysis purposes.
¤ Includes a clean and intuitive web ui (the Materials
Genomics UI) for exploring Mongo collections.
¤ http://pythonhosted.org//pymatgen-db/
18. Custodian
¨ Simple, robust and flexible just-in-time
(JIT) job management framework.
¤ Wrappers to perform error checking,
job management and error recovery.
¤ Error recovery is an important aspect
for HT: O(100,000) jobs + 1% error
rate => O(1000) errored jobs.
¤ Existing sub-packages for error
handling for VASP, NwChem and
QChem calculations.
¨ Blue: Controlled by subclasses of Job
¨ Red: Defined by ErrorHandlers.
19. Concrete Example for VASP
calculations
¨ Extensive set of rules have been codified for running VASP
calculations
¨ Significantly reduces error rate of calculations (< 1%)
20. VaspJob class
¨ auto_npar: automatically modifies NPAR in INCAR to a
relatively optimal number based on detected number of
processors! Enhances vasp calculation efficiency by ~10-30%!!!
¨ auto_gamma: If this is a gamma-only calculation and a
gamma compiled version of vasp exists, use it. Another
10-20% increase in efficiency!
¨ Even without error handling, custodian already significantly
improves resource utilization of running VASP calculations!
VaspJob(vasp_cmd, output_file="vasp.out”,
auto_npar=True, auto_gamma=True,
…<other options>...)
21. FireWorks is the Workflow Manager
21
Custom material
A cool material !!
Lots of information about
cool material !!
Submit!
Input generation
(parameter choice) Workflow mapping
Supercomputer
submission /
monitoring
Error
handling File Transfer
File Parsing /
DB insertion
22. FireWorks as a platform
Community can write any
workflow in FireWorks
à
We can automate it over
most supercomputing
resources
structure
charge
Band
structure
DOS
Optical
phonons
XAFS
spectra
GW
23. Workflows in Development by Internal/
External Collaborations
¨ Elastic constants (in production)
¨ Thermal properties (Phonon / GIBBS: in testing)
¨ Surfaces (in testing)
¨ GW / hybrid calculations
¨ ABINIT workflows (Geoffroy Hautier, UCL)
¨ Any code can be added and automated
25. Materials
Project DB
How do I
access MP
data?
Option 1: Direct access
Most flexible and powerful, but
• User needs to know db language
• Security is an issue
• Fragile – if db tech or schema
changes, user’s analysis breaks
26. Materials
Project DB
How do I
access MP
data?
Option 2: Web Apps
Pros
• Intuitive and user-friendly
• Secure
Cons
• Significant loss in flexibility
and power
WebApps
27. Materials
Project DB
How do I
access MP
data?
Option 3: Web Apps
built on RESTful API
Pros
• Intuitive and user-friendly
• Secure
WebApps
RESTfulAPI
• Programmatic access for developers
and researchers
28. The Materials API
An open platform for accessing Materials
Project data based on REpresentational State
Transfer (REST) principles.
Flexible and scalable to cater to large
number of users, with different access
privileges.
Simple to use and code agnostic.
29. A REST API maps a URL to a resource.
Example:
GET https://api.dropbox.com/1/account/info
Returns information about a user’s account.
Methods: GET, POST, PUT, DELETE, etc.
Response: Usually JSON or XML or both
32. Secure access
An individual API key provides secure access
with defined privileges.
All https requests must supply API key as
either a “x-api-key” header or a GET/POST
“API_KEY” parameter.
API key available at
https://www.materialsproject.org/dashboard
34. Can I really access any piece of data
in the Materials Project?
Github-powered RESTful documentation
http://bit.ly/materialsapi
Via the shockingly powerful
https://www.materialsproject.org/rest/v2/query
36. The Materials API + pymatgen in Education
– UCSD’s NANO 106
¨ Data mined over the Materials Project’s 49,000+ unique
crystals
http://www.bit.ly/sg_stats
P21/c is the most common
space group, comprising
~9.8% of all compounds
37. The Materials Virtual Lab @ UCSD’s
One-click AIMD
Starting candidates
Topological Screening
(augmented by DFT)
Stability (phase &
EW) screening
Diffusivity
Optimized
candidates
Automated “one-click” MD
workflow based on pymatgen,
custodian and fireworks
AIMD SDSC
Multi-week AIMD simulation
Statistical exclusionary
screening
Y. Mo, S. P. Ong, G. Ceder, “Insights into Diffusion Mechanisms in P2
Layered Oxide Materials by First-Principles Calculations”, submitted
Automated pathway
extraction + NEB
39. Sounds good, where do I learn more?
¨ The Materials Project
¤ https://www.materialsproject.org/open
¨ The Materials API Github Doc
¤ http://bit.ly/materialsapi
¨ The Materials Virtual Lab (MAVRL) @ UCSD
¤ Slides from Workshop on MP infrastructure (
http://mavrl.org/software)