Software Tools, Methods and Applications of Machine Learning
in Functional Materials Design
Anubhav Jain, Energy Storage & Distributed Resources Division, Berkeley Lab
Generate large computational data sets with pymatgen, FireWorks, and atomate
job 1
job 2
job 3 job 4
structure! workflow! database of all
workflows!
automatically submit + execute!output files + database!
Create machine-learning
models with matminer
Together with collaborators, we
have developed several software
packages for high-throughput data
generation, which have been used
to run millions of density functional
theory calculations and powers the
Materials Project database. This
software is available open-source
a n d w i t h c o m p r e h e n s i v e
documentation and support.
Left:	the	computational	infrastructure	of	
the	Materials	Project	database	(Jain	et	
al.,	APL	Materials	2013)	is	now	powered	
by	the	infrastructure	described	here.	
Right:	Calculating	the	electronic	
transport	properties	of	>40,000	
materials	(Ricci	et	al.,	Sci	Data	2017),	
resulting	in	the	experimental	discovery	
of	the	YCuTe2	thermoelectric	(Aydemir	
et	al.,	JMCA	2016).	
	
experiment
computation
Atomate is a library of standardized workflows
for VASP, Q-Chem, and FEFF codes. Given as
little information as a crystal structure or
molecule, atomate can perform >15 types of
calculation procedures, including band
structure, elastic tensor, thermal expansion,
and work function. Users can customize
settings or use defaults tuned by our team.
When calculations complete, the output files
are automatically parsed via pymatgen and the
information is organized into a database. A
series of database “builders” in atomate collect
data from individual calculations to generate
further database collections, including
searchable summary reports of materials, data
for constructing plots, and higher-level analyses
like phase diagram generation.
www.pymatgen.org https://atomate.org
https://materialsproject.github.io/fireworks
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
The pymatgen software
reads crystal structures from
a variety of file formats or
the Materials Project API. It
can perform many structure
operations such as:
•  surface / slab generation
•  order-disorder
•  interstitial finding
•  chemical substitution
and also create inputs for
many common DFT codes.
FireWorks is a workflow software that can
manage, monitor, and execute millions of
computational workflows across multiple
supercomputing centers. FireWorks
supports many features needed for the
materials science domain, including dynamic
(self-modifying) workflows and automatic
failure detection and rerun.
A recent plug-in for FireWorks called
rocketsled assists users in performing
machine learning-based adaptive design of
a search space, minimizing the number of
calculations needed to find a solution.
The matminer package lets one load data from
atomate databases, external web databases, or one
of 24 built-in large materials data sets. It can perform
feature extraction using >40 state-of-the-art methods,
and perform visualization or data mining using
common machine learning libraries. Matminer is
available open-source and comprehensive examples
of performing machine learning are available in the
form of interactive “Jupyter” notebooks.
https://hackingmaterials.github.io/matminer
Funding for this research was
provided by the U.S. Department
of Energy, Basic Energy Sciences,
Materials Science Division through
an Early Career Grant. Computing
resources were provided by the
National Energy Research Scientific
Computing Center.
https://hackingmaterials.lbl.gov
@jainpapers
Over	40	feature	
extraction	
routines	are	
implemented.	
	
atomate output
database(s)
phase
diagrams
Pourbaix
diagrams
diffusivity via MDband structure analysis

Software Tools, Methods and Applications of Machine Learning in Functional Materials Design

  • 1.
    Software Tools, Methodsand Applications of Machine Learning in Functional Materials Design Anubhav Jain, Energy Storage & Distributed Resources Division, Berkeley Lab Generate large computational data sets with pymatgen, FireWorks, and atomate job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database! Create machine-learning models with matminer Together with collaborators, we have developed several software packages for high-throughput data generation, which have been used to run millions of density functional theory calculations and powers the Materials Project database. This software is available open-source a n d w i t h c o m p r e h e n s i v e documentation and support. Left: the computational infrastructure of the Materials Project database (Jain et al., APL Materials 2013) is now powered by the infrastructure described here. Right: Calculating the electronic transport properties of >40,000 materials (Ricci et al., Sci Data 2017), resulting in the experimental discovery of the YCuTe2 thermoelectric (Aydemir et al., JMCA 2016). experiment computation Atomate is a library of standardized workflows for VASP, Q-Chem, and FEFF codes. Given as little information as a crystal structure or molecule, atomate can perform >15 types of calculation procedures, including band structure, elastic tensor, thermal expansion, and work function. Users can customize settings or use defaults tuned by our team. When calculations complete, the output files are automatically parsed via pymatgen and the information is organized into a database. A series of database “builders” in atomate collect data from individual calculations to generate further database collections, including searchable summary reports of materials, data for constructing plots, and higher-level analyses like phase diagram generation. www.pymatgen.org https://atomate.org https://materialsproject.github.io/fireworks Example: Order-disorder resolve partial or mixed occupancies into a fully ordered crystal structure (e.g., mixed oxide-fluoride site into separate oxygen/fluorine) The pymatgen software reads crystal structures from a variety of file formats or the Materials Project API. It can perform many structure operations such as: •  surface / slab generation •  order-disorder •  interstitial finding •  chemical substitution and also create inputs for many common DFT codes. FireWorks is a workflow software that can manage, monitor, and execute millions of computational workflows across multiple supercomputing centers. FireWorks supports many features needed for the materials science domain, including dynamic (self-modifying) workflows and automatic failure detection and rerun. A recent plug-in for FireWorks called rocketsled assists users in performing machine learning-based adaptive design of a search space, minimizing the number of calculations needed to find a solution. The matminer package lets one load data from atomate databases, external web databases, or one of 24 built-in large materials data sets. It can perform feature extraction using >40 state-of-the-art methods, and perform visualization or data mining using common machine learning libraries. Matminer is available open-source and comprehensive examples of performing machine learning are available in the form of interactive “Jupyter” notebooks. https://hackingmaterials.github.io/matminer Funding for this research was provided by the U.S. Department of Energy, Basic Energy Sciences, Materials Science Division through an Early Career Grant. Computing resources were provided by the National Energy Research Scientific Computing Center. https://hackingmaterials.lbl.gov @jainpapers Over 40 feature extraction routines are implemented. atomate output database(s) phase diagrams Pourbaix diagrams diffusivity via MDband structure analysis