The Materials Project
Ecosystem
A Complete Software and Data
Platform for Materials Informatics
Shyue Ping Ong, University of California, San Diego
“Information wants to be free.”
– Steward Brand, 1960s
“Information wants to be free and
code wants to be wrong.”
– RSA Conference 2008
“Materials information and code
wants to be free and right.”
The Materials Project is an open science
project to make the computed properties of
all known inorganic materials publicly
available to all researchers to accelerate
materials innovation.
June 2011: Materials Genome Initiative which
aims to “fund computational tools, software, new
methods for material characterization, and the
development of open standards and databases that
will make the process of discovery and development
of advanced materials faster, less expensive, and
more predictable”
https://www.materialsproject.org
As of Jun 5 2015
q  Over 58,000 unique
compounds, and growing
q  Diverse set of many
properties
q Structural (lattice parameters,
atomic positions, etc.),
q Energetic (formation
energies, phase stability, etc.)
q Electronic structure (DOS,
Bandstructures)
q Elastic constants
q  Suite of Web Apps for
materials analysis
User-friendly Web Apps
Materials Explorer: Search for materials by formula,
elements or properties
Battery Explorer: Search for battery materials by
voltage, capacity and other properties
Crystal Toolkit: Design new materials from existing
materials
Structure Predictor: Predict novel structures
Phase Diagram App: Generate compositional and
grand canonical phase diagrams
Pourbaix Diagram App: Generate Pourbaix
diagrams
Reaction Calculator: Balance reactions and calculate
their enthalpies
Materials Project data in User papers
M. Meinert, M.P. Geisler, Phase stability of chromium based
compensated ferrimagnets with inverse Heusler structure, J.
Magn. Magn. Mater. 341 (2013) 72–74.
J. Rustad, Density functional calculations of the enthalpies of
formation of rare-earth orthophosphates, Am. Mineral. 97
(2012) 791–799.
M. Fondell, T.J. Jacobsson, M. Boman, T. Edvinsson, Optical
quantum confinement in low dimensional hematite, J. Mater.
Chem. A. 2 (2014) 3352.
Web frontend is only the tip of the iceberg…
pymatgen
FireWorks
REST API
custodian
MPWorks
MPEnv
rubicon
Hierarchical design of codebases
keeps infrastructure nimble to changes
WORKFLOW CODE
CHEMISTRY CODE
Many types of use cases
FireWorks pymatgen custodian MPWorks
Crystal workflows
FireWorks pymatgen custodian rubicon (private)
Molecule workflows
pymatgen
FireWorks
external
MAST, MaterialsHub
external
Berlin ML, JGI, MoDeNa
Sustainable software development
¨  Open-source
¤  Managed via
¤  More eyes => robustness
¤  Contributions from all over the world
¨  Benevolent dictators
¤  Unified vision
¤  Quality control
¨  Clear documentation
¤  Prevent code rot
¤  More users
¨  Continuous integration and testing
¤  Ensure code is always working
Python Materials Genomics (pymatgen)
¨  Core materials analysis powering the Materials
Project
¨  Defines core extensible Python objects for materials
data representation.
¨  Provides a robust and well-documented set of
structure and thermodynamic analysis tools relevant to
many applications.
¨  Establishes an open platform for researchers to
collaboratively develop sophisticated analyses of
materials data.
Extensive Materials Analysis Capabilities
Input/
Output
objects
(Modular, Reusable, Extendable)
Defects and TransformationsElectronic Structure
XRD Patterns
Phase and Pourbaix Diagrams
Functional properties
Comprehensively
documented
Continuously tested
and integrated
Active dev/user community
www.pymatgen.org stats
•  > 6000 views per month on average
•  (~50% increase from previous year)
V2.9.12 è v3.0.13
*Python 2/3 compatible!
Other improvements
•  ABINIT support
•  Defects (Haranczyk/LBNL)
•  Qchem (JCESR)
•  Bug fixes & improvements
Very active user community!
81 forks (developers making changes and contributing)
Actual commits has slowed somewhat, as expected for
a maturing and robust code base.
Pymatgen-db
¨  Database add-on for pymatgen. Enables the
creation of Materials Project-style MongoDB
(www.mongodb.org) databases for management of
materials data. Key features:
¤  Query engine for easy translation of MongoDB docs to
useful pymatgen objects for analysis purposes.
¤  Includes a clean and intuitive web ui (the Materials
Genomics UI) for exploring Mongo collections.
¤  http://pythonhosted.org//pymatgen-db/
Custodian
¨  Simple, robust and flexible just-in-time
(JIT) job management framework.
¤  Wrappers to perform error checking,
job management and error recovery.
¤  Error recovery is an important aspect
for HT: O(100,000) jobs + 1% error
rate => O(1000) errored jobs.
¤  Existing sub-packages for error
handling for VASP, NwChem and
QChem calculations.
¨  Blue: Controlled by subclasses of Job
¨  Red: Defined by ErrorHandlers.
Concrete Example for VASP
calculations
¨  Extensive set of rules have been codified for running VASP
calculations
¨  Significantly reduces error rate of calculations (< 1%)
VaspJob class
¨  auto_npar: automatically modifies NPAR in INCAR to a
relatively optimal number based on detected number of
processors! Enhances vasp calculation efficiency by ~10-30%!!!
¨  auto_gamma: If this is a gamma-only calculation and a
gamma compiled version of vasp exists, use it. Another
10-20% increase in efficiency!
¨  Even without error handling, custodian already significantly
improves resource utilization of running VASP calculations!
VaspJob(vasp_cmd, output_file="vasp.out”,
auto_npar=True, auto_gamma=True,
…<other options>...)
FireWorks is the Workflow Manager
21	
  
Custom material
A cool material !!
Lots of information about
cool material !!
Submit!	
  
Input generation
(parameter choice) Workflow mapping
Supercomputer
submission /
monitoring
Error
handling File Transfer
File Parsing /
DB insertion
FireWorks as a platform
Community can write any
workflow in FireWorks
à
We can automate it over
most supercomputing
resources
structure
charge
Band
structure
DOS
Optical
phonons
XAFS
spectra
GW
Workflows in Development by Internal/
External Collaborations
¨  Elastic constants (in production)
¨  Thermal properties (Phonon / GIBBS: in testing)
¨  Surfaces (in testing)
¨  GW / hybrid calculations
¨  ABINIT workflows (Geoffroy Hautier, UCL)
¨  Any code can be added and automated
Materials
Project DB
How do I
access MP
data?
Materials
Project DB
How do I
access MP
data?
Option 1: Direct access
Most flexible and powerful, but
•  User needs to know db language
•  Security is an issue
•  Fragile – if db tech or schema
changes, user’s analysis breaks
Materials
Project DB
How do I
access MP
data?
Option 2: Web Apps
Pros
•  Intuitive and user-friendly
•  Secure
Cons
•  Significant loss in flexibility
and power
WebApps
Materials
Project DB
How do I
access MP
data?
Option 3: Web Apps
built on RESTful API
Pros
•  Intuitive and user-friendly
•  Secure
WebApps
RESTfulAPI
•  Programmatic access for developers
and researchers
The Materials API
An open platform for accessing Materials
Project data based on REpresentational State
Transfer (REST) principles.
Flexible and scalable to cater to large
number of users, with different access
privileges.
Simple to use and code agnostic.
A REST API maps a URL to a resource.
Example:
GET https://api.dropbox.com/1/account/info
Returns information about a user’s account.
Methods: GET, POST, PUT, DELETE, etc.
Response: Usually JSON or XML or both
Who implements REST APIs?
https://www.materialsproject.org/rest/v2/materials/Fe2O3/vasp/energy
Preamble
Identifier, typically a
formula (Fe2O3), id
(1234) or chemical
system (Li-Fe-O)
Data type (vasp,
exp, etc.)
Property
Request
type
Secure access
An individual API key provides secure access
with defined privileges.
All https requests must supply API key as
either a “x-api-key” header or a GET/POST
“API_KEY” parameter.
API key available at
https://www.materialsproject.org/dashboard
Sample output (JSON)
¨  Intuitive response
format
¨  Machine-readable
(JSON parsers
available for most
programming
languages)
¨  Metadata provides
provenance for
tracking
{
}
created_at: "2014-07-18T11:23:25.415382",
valid_response: true,
version: {
},
-
pymatgen: "2.9.9",
db: "2014.04.18",
rest: "1.0"
response: [
],
-
{
},
-
energy: -67.16532048,
material_id: "mp-24972"
{
},
-
energy: -132.33035197,
material_id: "mp-542309"
{…},+
{…},+
{…},+
{…},+
{…},+
{…},+
{…},+
{…}+
copyright: "Materials Project, 2012"
Can I really access any piece of data
in the Materials Project?
Github-powered RESTful documentation
http://bit.ly/materialsapi
Via the shockingly powerful
https://www.materialsproject.org/rest/v2/query
Demo
http://localhost:8888/notebooks
The Materials API + pymatgen in Education
– UCSD’s NANO 106
¨  Data mined over the Materials Project’s 49,000+ unique
crystals
http://www.bit.ly/sg_stats
P21/c is the most common
space group, comprising
~9.8% of all compounds
The Materials Virtual Lab @ UCSD’s
One-click AIMD
Starting candidates
Topological Screening
(augmented by DFT)
Stability (phase &
EW) screening
Diffusivity
Optimized
candidates
Automated “one-click” MD
workflow based on pymatgen,
custodian and fireworks
AIMD SDSC
Multi-week AIMD simulation
Statistical exclusionary
screening
Y. Mo, S. P. Ong, G. Ceder, “Insights into Diffusion Mechanisms in P2
Layered Oxide Materials by First-Principles Calculations”, submitted
Automated pathway
extraction + NEB
Coming soon (full
launch in next few
weeks)!!
Sounds good, where do I learn more?
¨  The Materials Project
¤  https://www.materialsproject.org/open
¨  The Materials API Github Doc
¤  http://bit.ly/materialsapi
¨  The Materials Virtual Lab (MAVRL) @ UCSD
¤  Slides from Workshop on MP infrastructure (
http://mavrl.org/software)
Thank you.

The Materials Project Ecosystem - A Complete Software and Data Platform for Materials Informatics

  • 1.
    The Materials Project Ecosystem AComplete Software and Data Platform for Materials Informatics Shyue Ping Ong, University of California, San Diego
  • 2.
    “Information wants tobe free.” – Steward Brand, 1960s
  • 3.
    “Information wants tobe free and code wants to be wrong.” – RSA Conference 2008
  • 4.
    “Materials information andcode wants to be free and right.”
  • 5.
    The Materials Projectis an open science project to make the computed properties of all known inorganic materials publicly available to all researchers to accelerate materials innovation. June 2011: Materials Genome Initiative which aims to “fund computational tools, software, new methods for material characterization, and the development of open standards and databases that will make the process of discovery and development of advanced materials faster, less expensive, and more predictable” https://www.materialsproject.org
  • 6.
    As of Jun5 2015 q  Over 58,000 unique compounds, and growing q  Diverse set of many properties q Structural (lattice parameters, atomic positions, etc.), q Energetic (formation energies, phase stability, etc.) q Electronic structure (DOS, Bandstructures) q Elastic constants q  Suite of Web Apps for materials analysis
  • 7.
    User-friendly Web Apps MaterialsExplorer: Search for materials by formula, elements or properties Battery Explorer: Search for battery materials by voltage, capacity and other properties Crystal Toolkit: Design new materials from existing materials Structure Predictor: Predict novel structures Phase Diagram App: Generate compositional and grand canonical phase diagrams Pourbaix Diagram App: Generate Pourbaix diagrams Reaction Calculator: Balance reactions and calculate their enthalpies
  • 8.
    Materials Project datain User papers M. Meinert, M.P. Geisler, Phase stability of chromium based compensated ferrimagnets with inverse Heusler structure, J. Magn. Magn. Mater. 341 (2013) 72–74. J. Rustad, Density functional calculations of the enthalpies of formation of rare-earth orthophosphates, Am. Mineral. 97 (2012) 791–799. M. Fondell, T.J. Jacobsson, M. Boman, T. Edvinsson, Optical quantum confinement in low dimensional hematite, J. Mater. Chem. A. 2 (2014) 3352.
  • 9.
    Web frontend isonly the tip of the iceberg… pymatgen FireWorks REST API custodian MPWorks MPEnv rubicon
  • 11.
    Hierarchical design ofcodebases keeps infrastructure nimble to changes WORKFLOW CODE CHEMISTRY CODE
  • 12.
    Many types ofuse cases FireWorks pymatgen custodian MPWorks Crystal workflows FireWorks pymatgen custodian rubicon (private) Molecule workflows pymatgen FireWorks external MAST, MaterialsHub external Berlin ML, JGI, MoDeNa
  • 13.
    Sustainable software development ¨ Open-source ¤  Managed via ¤  More eyes => robustness ¤  Contributions from all over the world ¨  Benevolent dictators ¤  Unified vision ¤  Quality control ¨  Clear documentation ¤  Prevent code rot ¤  More users ¨  Continuous integration and testing ¤  Ensure code is always working
  • 14.
    Python Materials Genomics(pymatgen) ¨  Core materials analysis powering the Materials Project ¨  Defines core extensible Python objects for materials data representation. ¨  Provides a robust and well-documented set of structure and thermodynamic analysis tools relevant to many applications. ¨  Establishes an open platform for researchers to collaboratively develop sophisticated analyses of materials data.
  • 15.
    Extensive Materials AnalysisCapabilities Input/ Output objects (Modular, Reusable, Extendable) Defects and TransformationsElectronic Structure XRD Patterns Phase and Pourbaix Diagrams Functional properties Comprehensively documented Continuously tested and integrated Active dev/user community
  • 16.
    www.pymatgen.org stats •  >6000 views per month on average •  (~50% increase from previous year) V2.9.12 è v3.0.13 *Python 2/3 compatible! Other improvements •  ABINIT support •  Defects (Haranczyk/LBNL) •  Qchem (JCESR) •  Bug fixes & improvements Very active user community! 81 forks (developers making changes and contributing) Actual commits has slowed somewhat, as expected for a maturing and robust code base.
  • 17.
    Pymatgen-db ¨  Database add-onfor pymatgen. Enables the creation of Materials Project-style MongoDB (www.mongodb.org) databases for management of materials data. Key features: ¤  Query engine for easy translation of MongoDB docs to useful pymatgen objects for analysis purposes. ¤  Includes a clean and intuitive web ui (the Materials Genomics UI) for exploring Mongo collections. ¤  http://pythonhosted.org//pymatgen-db/
  • 18.
    Custodian ¨  Simple, robustand flexible just-in-time (JIT) job management framework. ¤  Wrappers to perform error checking, job management and error recovery. ¤  Error recovery is an important aspect for HT: O(100,000) jobs + 1% error rate => O(1000) errored jobs. ¤  Existing sub-packages for error handling for VASP, NwChem and QChem calculations. ¨  Blue: Controlled by subclasses of Job ¨  Red: Defined by ErrorHandlers.
  • 19.
    Concrete Example forVASP calculations ¨  Extensive set of rules have been codified for running VASP calculations ¨  Significantly reduces error rate of calculations (< 1%)
  • 20.
    VaspJob class ¨  auto_npar:automatically modifies NPAR in INCAR to a relatively optimal number based on detected number of processors! Enhances vasp calculation efficiency by ~10-30%!!! ¨  auto_gamma: If this is a gamma-only calculation and a gamma compiled version of vasp exists, use it. Another 10-20% increase in efficiency! ¨  Even without error handling, custodian already significantly improves resource utilization of running VASP calculations! VaspJob(vasp_cmd, output_file="vasp.out”, auto_npar=True, auto_gamma=True, …<other options>...)
  • 21.
    FireWorks is theWorkflow Manager 21   Custom material A cool material !! Lots of information about cool material !! Submit!   Input generation (parameter choice) Workflow mapping Supercomputer submission / monitoring Error handling File Transfer File Parsing / DB insertion
  • 22.
    FireWorks as aplatform Community can write any workflow in FireWorks à We can automate it over most supercomputing resources structure charge Band structure DOS Optical phonons XAFS spectra GW
  • 23.
    Workflows in Developmentby Internal/ External Collaborations ¨  Elastic constants (in production) ¨  Thermal properties (Phonon / GIBBS: in testing) ¨  Surfaces (in testing) ¨  GW / hybrid calculations ¨  ABINIT workflows (Geoffroy Hautier, UCL) ¨  Any code can be added and automated
  • 24.
    Materials Project DB How doI access MP data?
  • 25.
    Materials Project DB How doI access MP data? Option 1: Direct access Most flexible and powerful, but •  User needs to know db language •  Security is an issue •  Fragile – if db tech or schema changes, user’s analysis breaks
  • 26.
    Materials Project DB How doI access MP data? Option 2: Web Apps Pros •  Intuitive and user-friendly •  Secure Cons •  Significant loss in flexibility and power WebApps
  • 27.
    Materials Project DB How doI access MP data? Option 3: Web Apps built on RESTful API Pros •  Intuitive and user-friendly •  Secure WebApps RESTfulAPI •  Programmatic access for developers and researchers
  • 28.
    The Materials API Anopen platform for accessing Materials Project data based on REpresentational State Transfer (REST) principles. Flexible and scalable to cater to large number of users, with different access privileges. Simple to use and code agnostic.
  • 29.
    A REST APImaps a URL to a resource. Example: GET https://api.dropbox.com/1/account/info Returns information about a user’s account. Methods: GET, POST, PUT, DELETE, etc. Response: Usually JSON or XML or both
  • 30.
  • 31.
    https://www.materialsproject.org/rest/v2/materials/Fe2O3/vasp/energy Preamble Identifier, typically a formula(Fe2O3), id (1234) or chemical system (Li-Fe-O) Data type (vasp, exp, etc.) Property Request type
  • 32.
    Secure access An individualAPI key provides secure access with defined privileges. All https requests must supply API key as either a “x-api-key” header or a GET/POST “API_KEY” parameter. API key available at https://www.materialsproject.org/dashboard
  • 33.
    Sample output (JSON) ¨ Intuitive response format ¨  Machine-readable (JSON parsers available for most programming languages) ¨  Metadata provides provenance for tracking { } created_at: "2014-07-18T11:23:25.415382", valid_response: true, version: { }, - pymatgen: "2.9.9", db: "2014.04.18", rest: "1.0" response: [ ], - { }, - energy: -67.16532048, material_id: "mp-24972" { }, - energy: -132.33035197, material_id: "mp-542309" {…},+ {…},+ {…},+ {…},+ {…},+ {…},+ {…},+ {…}+ copyright: "Materials Project, 2012"
  • 34.
    Can I reallyaccess any piece of data in the Materials Project? Github-powered RESTful documentation http://bit.ly/materialsapi Via the shockingly powerful https://www.materialsproject.org/rest/v2/query
  • 35.
  • 36.
    The Materials API+ pymatgen in Education – UCSD’s NANO 106 ¨  Data mined over the Materials Project’s 49,000+ unique crystals http://www.bit.ly/sg_stats P21/c is the most common space group, comprising ~9.8% of all compounds
  • 37.
    The Materials VirtualLab @ UCSD’s One-click AIMD Starting candidates Topological Screening (augmented by DFT) Stability (phase & EW) screening Diffusivity Optimized candidates Automated “one-click” MD workflow based on pymatgen, custodian and fireworks AIMD SDSC Multi-week AIMD simulation Statistical exclusionary screening Y. Mo, S. P. Ong, G. Ceder, “Insights into Diffusion Mechanisms in P2 Layered Oxide Materials by First-Principles Calculations”, submitted Automated pathway extraction + NEB
  • 38.
    Coming soon (full launchin next few weeks)!!
  • 39.
    Sounds good, wheredo I learn more? ¨  The Materials Project ¤  https://www.materialsproject.org/open ¨  The Materials API Github Doc ¤  http://bit.ly/materialsapi ¨  The Materials Virtual Lab (MAVRL) @ UCSD ¤  Slides from Workshop on MP infrastructure ( http://mavrl.org/software)
  • 40.