SlideShare a Scribd company logo
1 of 56
Download to read offline
Discovering new functional materials for clean
energy and beyond using high-throughput
computing and machine learning
Anubhav Jain
Lawrence Berkeley National Laboratory
Presentation given at Intel, Oct 2022
Slides (will be) posted to hackingmaterials.lbl.gov
Outline
• Introduction to group and overview of our projects
• The Materials Project and virtual materials design
• The Matbench protocol: benchmarking ML algorithms
• Natural language processing applied to materials design
• Automating materials synthesis and characterization
2
Overview of our research group
• Located at Lawrence Berkeley National Laboratory (Berkeley, CA)
• Group composition
• Usually 10 people in size (e.g., 5 postdocs, 5 graduate students)
• Major funding from U.S. Dept. of Energy, some funding from industry (Toyota Research
Institutes)
• Areas of emphasis
• Computational design of new functional materials
• Typically semiconductors, ceramics, or alloys
• e.g., past work in Li-ion and multivalent batteries, thermoelectric materials, carbon capture
materials, catalysts for water purification, etc.
• Not really polymers, molecular systems, or organic systems – although some past work here, too
• Machine learning applied to materials science
• Automated laboratories (recent)
3
We develop software frameworks for performing materials simulations,
including automation at supercomputing centers
Summary
• We develop and maintain
several software packages for
computational design of
materials
• These include “FireWorks”
for automating calculations at
supercomputing centers,
“atomate” for defining
materials science workflows,
and “matminer” for
generating descriptors for
crystal structures
4
We develop methods to calculate materials properties based on density
functional theory, often adapting methods for high-throughput applications
Summary
• Many materials properties are
either difficult to calculate or
require impractical amounts
of computer time
• We develop methods to
calculate materials properties
both accurately and
efficiently
• Examples include “AMSET”
(electron transport) and
ongoing work on thermal
properties of materials
5
Old method (BoltzTraP – screening is qualitative w/pitfalls)
New method (AMSET – screening is more quantitative)
Ganose, A. M.; Park, J.; Faghaninia, A.; Woods-Robinson, R.; Persson, K. A.; Jain, A. Efficient Calculation of Carrier Scattering Rates from First
Principles. Nat Commun 2021, 12 (1), 2222.
acoustic deformation potential (ad)
deformation potential, elastic tensor
ionized impurity (ii)
dielectric tensor
piezoelectric (pi)
dielectric tensor, piezoelectric tensor
polar optical phonon (po)
dielectric tensor, polar phonon frequency
a
Phonon
renormalization
at T > 0 K
Force constant
fitting
b
T= 0 K
T=100 K
T=200 K
Cubic SrTiO3 (Tc=105 K)
We use a combination of density functional theory calculations and machine
learning to design materials for various functional applications
Summary
• We trained machine learning
models (on open benchmark
data sets) to determine
catalytic performance of
materials in removing nitrate
from drinking water
• The models were used to pre-
screen ~60,000 materials to
only 23 materials that were
subjected to expensive
physics calculations for
verification
“Funnel” diagram illustrating how an initial list of
~60,000 compounds was passed through a
workflow to identify 23 interesting compounds.
ML was used in the workflow to pre-screen on
high activity and selectivity of N2/NH3.
The ML models show good correspondence with
significantly more expensive physical simulations
(“DFT”), demonstrating that they can be swapped
into the screening workflow reliably while extending
the search to ~500 times more compounds than
would be possible without ML augmentation.
6
“Screening of bimetallic electrocatalysts for water purification with machine learning”
Tran et al., J. Chem Phys 2022
We help develop and maintain a comprehensive database of materials
properties, with a user community of >250,000 registered users
Summary
• In general, only a small
fraction of materials have
available experimental
property measurements
• The Materials Project uses
massive supercomputing
resources to calculate the
properties of materials using
first principles calculations
• The data is disseminated to
large user community
7
Past year: average of
≈200 new regs/day
We develop and maintain “matbench”, a machine learning benchmark for
materials science, uncovering what works and what’s needed
Summary
• We created a comprehensive
set of benchmark tests for ML
algorithms that aim to predict
materials properties
• The benchmarks clearly
reveal what community
algorithms work
• They also helped show the
field that more research was
needed into “small data set”
algorithms, motivating
external works
The Matbench benchmark contains 13 data sets
that vary in size and application. Community
algorithms compete for best performance on
each data set.
The full ”leaderboard” of all algorithms to date
tested against all 13 data sets, organized by data
set size. Deep learning approaches typically excel
at large data problems but typically struggle with
small data; some hybrid approaches were
subsequently developed to address this.
https://doi.org/10.1038/s41524-020-00406-3 8
Bigger datasets
Better
relative
performance
We use natural language processing to parse scientific abstracts and articles
and generate data sets and hypotheses
Summary
• We used natural language
processing (NLP) to analyze
the text of several million
article abstracts
• With no domain-specific
training, the ML system
internalized a representation
of the periodic table
• More impressively, it could
predict what materials
researchers would study for
“thermoelectrics” in the future A representation of the periodic table generated
automatically by analyzing >3 million abstracts
Materials compositions for thermoelectrics
applications as predicted by NLP ~3 years ago.
Since then, approximately 1/3 of the predictions
had been reported by researchers.
https://doi.org/10.1038/s41586-019-1335-8
Sponsor: SPP, Toyota Research Institute 9
Summary
• We are collaborating with
other groups at LBNL (G.
Ceder, H. Kim) to develop an
automated laboratory for
automated inorganic
materials synthesis
• A contrast to other similar
efforts is working primarily
with powder based synthesis
procedures
• Several aspects already
completed, but still a work in
progress
10
July 2022
- Tube furnaces and
SEM ready
Hardware
development
Platform
Integration
Automated
Synthesis
AI-guided
Synthesis
April 2022
Box furnace, XRD,
& robots ready
November 2022
- Powder dosing system
- First automated syntheses
Summer 2023
AI-guided synthesis
Closed-
Loop
Materials
Discovery
Summer 2024
Closed-loop
materials discovery
Moving from the virtual world to the physical world:
A-lab for automated synthesis of inorganic materials
Miscellaneous projects – analysis of large solar PV data sets,
data extraction from figures
Summary
• We also have various other
miscellaneous projects at any
given time
• For example, we recently
trained an ML algorithm to
classify electroluminescence
images from solar power
plants and use this to assess
fire damage
• We also developed software
to help parse data from
figures
Pipeline developed to process raw EL images
(bottom-left), extract modules, segment
individual cells, and classify cells into various
defect categories using deep learning models.
This open-source pipeline can replace tedious
human annotation of module EL images at a
large scale.
11
Examples of using machine learning to identify
portions of chart images and extracting data
curves based on color
Outline
• Introduction to group and overview of our projects
• The Materials Project and virtual materials design
• The Matbench protocol: benchmarking ML algorithms
• Natural language processing applied to materials design
• Automating materials synthesis and characterization
12
The core of Materials Project is a free database of
calculated materials properties and crystal structures
Free, public resource
• www.materialsproject.org
Data on ~150,000 materials,
including information on:
• electronic structure
• phonon and thermal
properties
• elastic / mechanical properties
• magnetic properties
• ferroelectric properties
• piezoelectric properties
• dielectric properties
Powered by hundreds of millions
of CPU-hours invested into high-
quality calculations
The core data set keeps growing with time …
14
Apps give insight into data
Materials Explorer
Phase Stability Diagrams
Pourbaix Diagrams
(Aqueous Stability)
Battery Explorer
15
The code powering the Materials Project is
available open source (BSD/MIT licenses)
just-in-time error correction, fixing your
calculations so you don’t have to
‘recipes' for common materials
science simulation tasks
making materials science web apps easy
workflow management software for
high-throughput computing
materials science analysis code:
make, transform and analyze crystals,
phase diagrams and more
& more … MP team members also contribue to
several other non-MP codes, e.g. matminer for
machine learning featurization
16
The Materials Project is used heavily by the research
community
> 180,000 registered
users
> 40,000 new users last year
~100 new registrations/day
~10,000 users log on every day
> 2M+ records downloaded through API each day; 1.8 TB of data served per
month
17
Today, the Materials Project has led to
many examples of “computer to lab”
success stories
MP for p-type transparent conductors
References
✦ Hautier, G., Miglio,A., Ceder, G., Rignanese, G.-M. & Gonze, X. Identification and
design principles of low hole effective mass p-type transparent conducting oxides.
Nature Communications 4, (2013)
✦ Bhatia,A. et al. High-Mobility Bismuth-based Transparent p-Type Oxide from High-
Throughput Material Screening. Chemistry of Materials 28, 30–34 (2015)
✦ Ricci, F. et al.An ab initio electronic transport database for inorganic materials.
Scientific Data 4, (2017)
Prediction
Screening based on band
gap, transport properties
and band alignments.
Experiment
Predictions revealed
material with s–p
hybridized valence band
(thought to correlate
well with dopability).
When synthesized,
material has excellent
transparency and readily
dopable with K.
Ba2BiTaO6
MP for thermoelectrics
References
✦ Aydemir, U. et al.YCuTe2: a member of a new class of thermoelectric materials with
CuTe4-based layered structure. Journal of Materials Chemistry A 4, 2461–2472 (2016)
✦ Zhu, H. et al. Computational and experimental investigation of TmAgTe2and
XYZ2compounds, a new group of thermoelectric materials identified by first-principles
high-throughput screening. Journal of Materials Chemistry C 3, 10554–10565 (2015).
✦ Pöhls, J.-H. et al. Metal phosphides as potential thermoelectric materials. Journal of
Materials Chemistry C 5, 12441–12456 (2017).
Prediction
Screening of tens of
thousands of materials
with predicted electron
transport properties
revealed a family of
promising XYZ2
candidates
Experiment
Several materials made:
YCuTe2 (zT = 0.75),
TmAgTe2 (zT = 0.47, 1.8
theoretical), novel NiP2
phosphide
TmAgTe2
MP for phosphors
References
✦ Wang, Z. et al. Mining Unexplored Chemistries for Phosphors for High-Color-
Quality White-Light-Emitting Diodes. Joule 2, 914–926 (2018)
✦ Li, S. et al. Data-Driven Discovery of Full-Visible-Spectrum Phosphor. Chemistry of
Materials 31, 6286–6294 (2019)
✦ Ha, J. et al. Color tunable single-phase Eu2+ and Ce3+ co-activated Sr2LiAlO4
phosphors. Journal of Materials Chemistry C 7, 7734–7744 (2019)
Prediction
Statistical analysis of existing
materials that co-occur with
word ‘phosphor’ followed
by structure prediction for
new materials
Experiment
Predicted first known Sr-Li-
Al-N quaternary, showed
green-yellow/blue emission
with quantum efficiency of
25% (Eu), 40% (Ce), 55%
(co-activated Eu, Ce)
Sr2LiAlN4
≈ç ≈
18
One of the applications we looked into was
thermoelectric materials
19
• A thermoelectric material
generates a voltage based on
thermal gradient
• Applications
• Heat to electricity
• Refrigeration
• Advantages include:
• Reliability
• Easy to scale to different sizes
(including compact)
www.alphabetenergy.com
It is difficult to balance trade-offs in
thermoelectrics properties, so use screening
20
ZT = α2σT/κ
power factor
>2 mW/mK2
(PbTe=10 mW/mK2)
Seebeck coefficient
> 100 V/K
Band structure + Boltztrap
electrical conductivity
> 103 /(ohm-cm)
Band structure + Boltztrap
thermal conductivity
< 1 W/(m*K)
• e from Boltztrap
• l difficult (phonon-phonon scattering)
Heavy band:
ü Large DOS
(higher Seebeck and more carriers)
✗ Large effective mass
(poor mobility)
Light band:
ü Small effective mass
(improved mobility)
✗ Small DOS
(lower Seebeck, fewer carriers)
Multiple bands, off symmetry:
ü Large DOS with small effective
mass
✗ Difficult to design!
E
k
~50,000 crystal
structures and
band structures
from Materials
Project are used
as a source F. Ricci, et al., An ab initio electronic transport
database for inorganic materials, Sci. Data. 4
(2017) 170085.
We compute electronic
transport properties
with BoltzTraP and
minimum thermal
conductivity (Cahill-
Pohl) for some
compounds
About 300GB of
electronic transport
data is generated. All
data is available free
for download.
We found several compounds with promising
figure-of-merit, but no breakthroughs
21
• Calculations:
trigonal p-
TmAgTe2 could
have power
factor up to 8
mW/mK2
• requires 1020/cm3
carriers
experiment
computation
• Calculations: p-YCuTe2 could
only reach PF of 0.4
mW/mK2
• SOC inhibits PF
• if thermal conductivity is low
(e.g., 0.4, we get zT ~1)
• Expt: zT ~0.75 – not too far
from calculation limit
• carrier concentration of 1019
• Decent performance, but
unlikely to be improved with
further optimization
• Expt: p-zT only 0.35 despite
very low thermal
conductivity (~0.25 W/mK)
• Limitation: carrier
concentration (~1017/cm3)
• likely limited by TmAg
defects, as determined by
followup calculations
• Later, we achieved zT ~ 0.47
using Zn-doping
TmAgTe2
YCuTe2
Outline
• Introduction to group and overview of our projects
• The Materials Project and virtual materials design
• The Matbench protocol: benchmarking ML algorithms
• Natural language processing applied to materials design
• Automating materials synthesis and characterization
22
There are many new algorithms being published
for ML in materials –
New ones constantly reported!
23
But it is very difficult to compare
algorithms
24
Data set used
in study A
Data set used
in study B
Data set used
in study C
• Different data sets
• Source (e.g., OQMD vs MP vs JARVIS)
• Quantity (e.g., MP 2019 vs MP 2022)
• Subset / data filtering (e.g., ehull<X)
• Different evaluation metrics
• Test set vs. cross validation?
• Different test set fraction?
• Can be difficult to install and retrain
many of these algorithms
MAE 5-Fold CV = 0.102 eV
RMSE Test set = 0.098 eV
vs.
? ?
Can we design a standard test set for ML
algorithms for materials science?
25
• There is no single type of problem that materials scientists are trying
to solve
• For now, focus on materials property prediction (from structure or
composition)
• We want a test set that contains a diverse array of problems
• Smaller data versus larger data
• Different applications (electronic, mechanical, etc.)
• Composition-only or structure information available
• Experimental vs. Ab-initio
• Classification or regression
Matbench includes 13 different ML tasks
26
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference
Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
Models tested by Matbench to date
Model Representation type Representation summary
Magpie + Sine Coulomb
Matrix + Random Forest
Composition
or Structure
Hand-created chemical features coupled with random
forest ML algorithm
Automatminer Composition
or Structure
Hand-created chemical features with genetic algorithm
based ML algorithm and hyperparameter selection
MODNET Composition
or Structure
Hand-created chemical features with various neural
network layers
CGCNN Structure only Graph convolution based neural networks with basic
initial atom/bond features
ALIGNN Structure only Graph based convolutional networks based on
bonds/angles in addition to atoms/bonds
CRABNet Composition only Transformer-based self-attention for composition;
initialized using NLP-based embeddings
27
How to read the Matbench leaderboard
28
Bigger datasets
Better
relative
performance
• A scaled error of 0.0 means all
predictions are correct
• A scaled error of 1.0 is equal
to always predicting the
average value
Magpie + SCF Model
• Composition features using
chemical descriptors such as
averages/stdevs of elemental
properties such as melting
point, electronegativity
• Structure features using sine
Coulomb matrix
29
Ward, L., Agrawal, A., Choudhary, A. et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput Mater 2, 16028 (2016).
Faber, Felix, et al. "Crystal structure representations for machine learning models of formation energies." International Journal of Quantum Chemistry 115.16 (2015): 1094-1101.
https://matbench.materialsproject.org
Automatminer Model
30
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput
Mater 2020, 6 (1), 138.
https://matbench.materialsproject.org
MODNet Model
31
De Breuck, P.-P.; Evans, M. L.; Rignanese, G.-M. Robust Model Benchmarking and Bias-Imbalance in Data-Driven Materials Science: A Case Study on MODNet. Journal of Physics:
Condensed Matter, Volume 33, Number 40, 2021
https://matbench.materialsproject.org
CGCNN Model
32
Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120 (14), 145301.
https://matbench.materialsproject.org
ALIGNN Model
33
Choudhary, Kamal, and Brian DeCost. "Atomistic Line Graph Neural Network for improved materials property predictions." npj Computational Materials 7.1 (2021): 1-8.
https://matbench.materialsproject.org
How much have we
improved overall?
34
• In some cases (e.g., Ef DFT) we
have made a lot of
improvement
• In contrast, for others (e.g., σy
steel alloys) we have barely
improved
• Possible reasons
• Amount of attention paid to
certain problems
• Small vs large data emphasis –
there is a lot more room for
improvement for small data
How could we improve Matbench?
• Additional tasks – but how to keep it manageable?
• Adding external conditions (temperature, reducing gas presence,
microstructural characterizations)
• Other materials classes (polymers, metal alloys, multi-material composites)
• Other types of properties (e.g., predicting spectra)
• More dynamic tests, e.g. update the test periodically and re-evaluate
• Other scoring metrics
• e.g., active learning searches
• cross-validation by leaving out chemical systems rather than random splits
35
Outline
• Introduction to group and overview of our projects
• The Materials Project and virtual materials design
• The Matbench protocol: benchmarking ML algorithms
• Natural language processing applied to materials design
• Automating materials synthesis and characterization
36
Literature data can be a key source of materials learning
37
Plan
Synthesize
Characterize
Analyze
local db +
ML
Automated Lab A
Plan
Synthesize
Characterize
Analyze
Conventional Lab B
Plan
Synthesize
Characterize
Analyze
local db +
ML
Automated Lab C
Literature data
+ broad coverage
– difficult to parse
– lack negative examples
Other A-lab data
+ structured data formats
+ negative examples
– not much out there …
Theory data
+ readily available
– difficult to establish
relevance to synthesis
The NLP Solution to Literature Data
• A lot of prior experimental data already exists in the literature that would take
untold costs and labor to replicate again
• Advantages to this data set are broad coverage of materials and techniques
• Disadvantages include:
• Getting access to the data
• lack of negative examples in the data
• missing / unreliable information
• difficulty to obtain structured data from unstructured text
• Natural language processing can help with the last part, although considerable
difficulties are still involved
• Named entity recognition
• Identify precursors, amounts, characteristics, etc.
• Relationship modeling
• Relate the extracted entities to one another
Previous approach for extracting data from
text
39
Weston, L. et al Named Entity Recognition
and Normalization Applied to Large-Scale
Information Extraction from the Materials
Science Literature. J. Chem. Inf. Model.
(2019)
Recently, we also tried BERT variants
Trewartha, A.; Walker, N.; Huo, H.; Lee, S.;
Cruse, K.; Dagdelen, J.; Dunn, A.; Persson,
K. A.; Ceder, G.; Jain, A. Quantifying the
Advantage of Domain-Specific Pre-Training
on Named Entity Recognition Tasks in
Materials Science. Patterns 2022, 3 (4),
100488.
Models were good for labeling entities, but
didn’t understand relationships
40
Named Entity Recognition
• Custom machine learning models to
extract the most valuable materials-related
information.
• Utilizes a long short-term memory (LSTM)
network trained on ~1000 hand-annotated
abstracts.
Trewartha, A.; Walker, N.; Huo, H.; Lee, S.;
Cruse, K.; Dagdelen, J.; Dunn, A.; Persson,
K. A.; Ceder, G.; Jain, A. Quantifying the
Advantage of Domain-Specific Pre-Training
on Named Entity Recognition Tasks in
Materials Science. Patterns 2022, 3 (4),
100488.
A Sequence-to-Sequence Approach
• Language model takes a sequence of tokens as input and
outputs a sequence of tokens
• Maximizes the likelihood of the output conditioned on the input
• Additionally includes task conditioning
• Capacity for “understanding” language as well as “world
knowledge”
• Task conditioning with arbitrary Seq2Seq provides extremely
flexible framework
• Large seq2seq2 models can generate text that naturally
completes a paragraph
How a sequence-to-sequence approach works
42
Seq2Seq model
(GPT3)
Text in (“prompt”) Text out (“completion”)
Another example
43
Seq2Seq model
(GPT3)
Text in (“prompt”) Text out (“completion”)
Structured data
44
Seq2Seq model
(GPT3)
Text in (“prompt”) Text out (“completion”)
But it’s not perfect for technical data
45
Seq2Seq model
(GPT3)
Text in (“prompt”) Text out (“completion”)
A workflow for fine-tuning GPT-3
1. Initial training set of templates
filled mostly manually, as zero-
shot GPT is often poor for
technical tasks
2. Fine-tune model to fill
templates, use the model to
assist in annotation
3. Repeat as necessary until
desired inference accuracy is
achieved
Templated extraction of synthesis recipes
• Annotate paragraphs to output
structured recipe templates
• JSON-format
• Designed using domain knowledge from
experimentalists
• Template is relation graph to be filled in
by model
• Note: we are still formally evaluating
performance
• various issues in getting an accurate
evaluation, e.g., predictions that are
functionally correct but written differently
Example Prediction
Applied to solid state synthesis / doping
We have performed the first-principles calculations onto the structural,
electronic and magnetic properties of seven 3d transition-metal (TM=V, Cr,
Mn, Fe, Co, Ni and Cu) atom substituting cation Zn in both zigzag (10,0) and
armchair (6,6) zinc oxide nanotubes (ZnONTs). The results show that there
exists a structural distortion around 3d TM impurities with respect to the
pristine ZnONTs. The magnetic moment increases for V-, Cr-doped ZnONTs
and reaches maximum for Mn-doped ZnONTs, and then decreases for Fe-, Co-
, Ni- and Cu-doped ZnONTs successively, which is consistent with the
predicted trend of Hund’s rule for maximizing the magnetic moments of the
doped TM ions. However, the values of the magnetic moments are smaller than
the predicted values of Hund’s rule due to strong hybridization between p
orbitals of the nearest neighbor O atoms of ZnONTs and d orbitals of the TM
atoms. Furthermore, the Mn-, Fe-, Co-, Cu-doped (10,0) and (6,6) ZnONTs
with half-metal and thus 100% spin polarization characters seem to be good
candidates for spintronic applications.
Use in initial hypothesis generation
50
classifying AuNP
morphologies based
on precursors used
Predicting new
materials for
functional
applications
predicting doping – if
a material can be
doped with A, can it
be doped with B?
Investigated as thermoelectrics
(independently of our study)
Investigated by our own collaborators
(as a result of our study)
(done using an older
method)
Outline
• Introduction to group and overview of our projects
• The Materials Project and virtual materials design
• The Matbench protocol: benchmarking ML algorithms
• Natural language processing applied to materials design
• Automating materials synthesis and characterization
51
Developing an automated lab (“A-lab”) that makes use
of literature data is in progress
52
Plan
Synthesize
Characterize
Analyze
local db +
ML
Automated Lab A
Plan
Synthesize
Characterize
Analyze
Conventional Lab B
Plan
Synthesize
Characterize
Analyze
local db +
ML
Automated Lab C
Literature data
+ broad coverage
– difficult to parse
– lack negative examples
Other A-lab data
+ structured data formats
+ negative examples
– not much out there …
Theory data
+ readily available
– difficult to establish
relevance to synthesis
The A-lab facility is designed to handle inorganic
powders
53
In operation:
XRD
Robot
Box furnaces
Setting up:
Tube
furnace x 4
LBNL bldg. 30
Dosing and mixing
Facility will handle powder-
based synthesis of inorganic
materials, with automated
characterization and
experimental planning
Collaboration w/ G. Ceder & H. Kim
July 2022
- Tube furnaces and
SEM ready
Hardware
development
Platform
Integration
Automated
Synthesis
AI-guided
Synthesis
April 2022
Box furnace, XRD,
& robots ready
November 2022
- Powder dosing system
- First automated syntheses
Summer 2023
AI-guided synthesis
Closed-
Loop
Materials
Discovery
Summer 2024
Closed-loop
materials discovery
Lab starting to take shape …
54
Courtesy Y. Fei,
Ceder Group
The embedded video
shows a robotic arm
performing various
synthesis tasks, such
as loading a box
furnace and
performing multiple
steps needed to
prepare and load an
XRD sample.
Other videos (not
shown here) show ball
milling, interaction
with tube furnaces.
A powder doser is
expected to arrive in
1-2 months.
The continuing challenge – putting it all together!
Currently we are still working on various components
Historical-data
Initial hypotheses
data-api
NLP and literature data
ML algorithms High-throughput DFT data
Acknowledgements
NLP
• Nick Walker
• John Dagdelen
• Alex Dunn
• Sanghoon Lee
• Amalie Trewartha
56
A-lab
• Rishi Kumar
• Yuxing Fei
• Haegyum Kim
• Gerbrand Ceder
Funding provided by:
• U.S. Department of Energy, Basic Energy Science, “D2S2” program
• Toyota Research Institutes, Accelerated Materials Design program
• Lawrence Berkeley National Laboratory “LDRD” program
Slides (will be) posted to hackingmaterials.lbl.gov
Materials Project
• Kristin Persson
• Matthew Horton
• All MP collaborators,
too many to name
…

More Related Content

What's hot

The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
High entropy alloys
High entropy alloysHigh entropy alloys
High entropy alloysSounak Guha
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Anubhav Jain
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsAnubhav Jain
 
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum DotsA DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum DotsAthanasiosKoliogiorg
 
Graphene Syntheis and Characterization for Raman Spetroscopy At High Pressure
Graphene Syntheis and Characterization for Raman Spetroscopy At High PressureGraphene Syntheis and Characterization for Raman Spetroscopy At High Pressure
Graphene Syntheis and Characterization for Raman Spetroscopy At High PressureNicolasMORAL
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...aimsnist
 
Graphene : the futuristic element.....
Graphene : the futuristic element..... Graphene : the futuristic element.....
Graphene : the futuristic element..... MD NAZRE IMAM
 
BIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsBIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsbios203
 
Graphene presentation 11 March 2014
Graphene presentation 11 March 2014Graphene presentation 11 March 2014
Graphene presentation 11 March 2014Jonathan Fosdick
 
Statistical Mechanics & Thermodynamics 2: Physical Kinetics
Statistical Mechanics & Thermodynamics 2: Physical KineticsStatistical Mechanics & Thermodynamics 2: Physical Kinetics
Statistical Mechanics & Thermodynamics 2: Physical KineticsInon Sharony
 

What's hot (20)

The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
High entropy alloys
High entropy alloysHigh entropy alloys
High entropy alloys
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Lecture6
Lecture6Lecture6
Lecture6
 
Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum DotsA DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
A DFT & TDDFT Study of Hybrid Halide Perovskite Quantum Dots
 
Graphene Syntheis and Characterization for Raman Spetroscopy At High Pressure
Graphene Syntheis and Characterization for Raman Spetroscopy At High PressureGraphene Syntheis and Characterization for Raman Spetroscopy At High Pressure
Graphene Syntheis and Characterization for Raman Spetroscopy At High Pressure
 
Seminar graphene ppt
Seminar  graphene pptSeminar  graphene ppt
Seminar graphene ppt
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
 
Graphene : the futuristic element.....
Graphene : the futuristic element..... Graphene : the futuristic element.....
Graphene : the futuristic element.....
 
Intro to DFT+U
Intro to DFT+U Intro to DFT+U
Intro to DFT+U
 
BIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsBIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamics
 
High Entropy Of Alloys
High Entropy Of AlloysHigh Entropy Of Alloys
High Entropy Of Alloys
 
NANO266 - Lecture 2 - The Hartree-Fock Approach
NANO266 - Lecture 2 - The Hartree-Fock ApproachNANO266 - Lecture 2 - The Hartree-Fock Approach
NANO266 - Lecture 2 - The Hartree-Fock Approach
 
Graphene presentation 11 March 2014
Graphene presentation 11 March 2014Graphene presentation 11 March 2014
Graphene presentation 11 March 2014
 
Graphene
GrapheneGraphene
Graphene
 
Investigation on thermoelectric material
Investigation on thermoelectric materialInvestigation on thermoelectric material
Investigation on thermoelectric material
 
Statistical Mechanics & Thermodynamics 2: Physical Kinetics
Statistical Mechanics & Thermodynamics 2: Physical KineticsStatistical Mechanics & Thermodynamics 2: Physical Kinetics
Statistical Mechanics & Thermodynamics 2: Physical Kinetics
 

Similar to Discovering new functional materials for clean energy and beyond using high-throughput computing and machine learning

The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applicationsaimsnist
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 

Similar to Discovering new functional materials for clean energy and beyond using high-throughput computing and machine learning (20)

The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 

More from Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAnubhav Jain
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Anubhav Jain
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Anubhav Jain
 

More from Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...Natural Language Processing for Materials Design - What Can We Extract From t...
Natural Language Processing for Materials Design - What Can We Extract From t...
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
 

Recently uploaded

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 

Recently uploaded (20)

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

Discovering new functional materials for clean energy and beyond using high-throughput computing and machine learning

  • 1. Discovering new functional materials for clean energy and beyond using high-throughput computing and machine learning Anubhav Jain Lawrence Berkeley National Laboratory Presentation given at Intel, Oct 2022 Slides (will be) posted to hackingmaterials.lbl.gov
  • 2. Outline • Introduction to group and overview of our projects • The Materials Project and virtual materials design • The Matbench protocol: benchmarking ML algorithms • Natural language processing applied to materials design • Automating materials synthesis and characterization 2
  • 3. Overview of our research group • Located at Lawrence Berkeley National Laboratory (Berkeley, CA) • Group composition • Usually 10 people in size (e.g., 5 postdocs, 5 graduate students) • Major funding from U.S. Dept. of Energy, some funding from industry (Toyota Research Institutes) • Areas of emphasis • Computational design of new functional materials • Typically semiconductors, ceramics, or alloys • e.g., past work in Li-ion and multivalent batteries, thermoelectric materials, carbon capture materials, catalysts for water purification, etc. • Not really polymers, molecular systems, or organic systems – although some past work here, too • Machine learning applied to materials science • Automated laboratories (recent) 3
  • 4. We develop software frameworks for performing materials simulations, including automation at supercomputing centers Summary • We develop and maintain several software packages for computational design of materials • These include “FireWorks” for automating calculations at supercomputing centers, “atomate” for defining materials science workflows, and “matminer” for generating descriptors for crystal structures 4
  • 5. We develop methods to calculate materials properties based on density functional theory, often adapting methods for high-throughput applications Summary • Many materials properties are either difficult to calculate or require impractical amounts of computer time • We develop methods to calculate materials properties both accurately and efficiently • Examples include “AMSET” (electron transport) and ongoing work on thermal properties of materials 5 Old method (BoltzTraP – screening is qualitative w/pitfalls) New method (AMSET – screening is more quantitative) Ganose, A. M.; Park, J.; Faghaninia, A.; Woods-Robinson, R.; Persson, K. A.; Jain, A. Efficient Calculation of Carrier Scattering Rates from First Principles. Nat Commun 2021, 12 (1), 2222. acoustic deformation potential (ad) deformation potential, elastic tensor ionized impurity (ii) dielectric tensor piezoelectric (pi) dielectric tensor, piezoelectric tensor polar optical phonon (po) dielectric tensor, polar phonon frequency a Phonon renormalization at T > 0 K Force constant fitting b T= 0 K T=100 K T=200 K Cubic SrTiO3 (Tc=105 K)
  • 6. We use a combination of density functional theory calculations and machine learning to design materials for various functional applications Summary • We trained machine learning models (on open benchmark data sets) to determine catalytic performance of materials in removing nitrate from drinking water • The models were used to pre- screen ~60,000 materials to only 23 materials that were subjected to expensive physics calculations for verification “Funnel” diagram illustrating how an initial list of ~60,000 compounds was passed through a workflow to identify 23 interesting compounds. ML was used in the workflow to pre-screen on high activity and selectivity of N2/NH3. The ML models show good correspondence with significantly more expensive physical simulations (“DFT”), demonstrating that they can be swapped into the screening workflow reliably while extending the search to ~500 times more compounds than would be possible without ML augmentation. 6 “Screening of bimetallic electrocatalysts for water purification with machine learning” Tran et al., J. Chem Phys 2022
  • 7. We help develop and maintain a comprehensive database of materials properties, with a user community of >250,000 registered users Summary • In general, only a small fraction of materials have available experimental property measurements • The Materials Project uses massive supercomputing resources to calculate the properties of materials using first principles calculations • The data is disseminated to large user community 7 Past year: average of ≈200 new regs/day
  • 8. We develop and maintain “matbench”, a machine learning benchmark for materials science, uncovering what works and what’s needed Summary • We created a comprehensive set of benchmark tests for ML algorithms that aim to predict materials properties • The benchmarks clearly reveal what community algorithms work • They also helped show the field that more research was needed into “small data set” algorithms, motivating external works The Matbench benchmark contains 13 data sets that vary in size and application. Community algorithms compete for best performance on each data set. The full ”leaderboard” of all algorithms to date tested against all 13 data sets, organized by data set size. Deep learning approaches typically excel at large data problems but typically struggle with small data; some hybrid approaches were subsequently developed to address this. https://doi.org/10.1038/s41524-020-00406-3 8 Bigger datasets Better relative performance
  • 9. We use natural language processing to parse scientific abstracts and articles and generate data sets and hypotheses Summary • We used natural language processing (NLP) to analyze the text of several million article abstracts • With no domain-specific training, the ML system internalized a representation of the periodic table • More impressively, it could predict what materials researchers would study for “thermoelectrics” in the future A representation of the periodic table generated automatically by analyzing >3 million abstracts Materials compositions for thermoelectrics applications as predicted by NLP ~3 years ago. Since then, approximately 1/3 of the predictions had been reported by researchers. https://doi.org/10.1038/s41586-019-1335-8 Sponsor: SPP, Toyota Research Institute 9
  • 10. Summary • We are collaborating with other groups at LBNL (G. Ceder, H. Kim) to develop an automated laboratory for automated inorganic materials synthesis • A contrast to other similar efforts is working primarily with powder based synthesis procedures • Several aspects already completed, but still a work in progress 10 July 2022 - Tube furnaces and SEM ready Hardware development Platform Integration Automated Synthesis AI-guided Synthesis April 2022 Box furnace, XRD, & robots ready November 2022 - Powder dosing system - First automated syntheses Summer 2023 AI-guided synthesis Closed- Loop Materials Discovery Summer 2024 Closed-loop materials discovery Moving from the virtual world to the physical world: A-lab for automated synthesis of inorganic materials
  • 11. Miscellaneous projects – analysis of large solar PV data sets, data extraction from figures Summary • We also have various other miscellaneous projects at any given time • For example, we recently trained an ML algorithm to classify electroluminescence images from solar power plants and use this to assess fire damage • We also developed software to help parse data from figures Pipeline developed to process raw EL images (bottom-left), extract modules, segment individual cells, and classify cells into various defect categories using deep learning models. This open-source pipeline can replace tedious human annotation of module EL images at a large scale. 11 Examples of using machine learning to identify portions of chart images and extracting data curves based on color
  • 12. Outline • Introduction to group and overview of our projects • The Materials Project and virtual materials design • The Matbench protocol: benchmarking ML algorithms • Natural language processing applied to materials design • Automating materials synthesis and characterization 12
  • 13. The core of Materials Project is a free database of calculated materials properties and crystal structures Free, public resource • www.materialsproject.org Data on ~150,000 materials, including information on: • electronic structure • phonon and thermal properties • elastic / mechanical properties • magnetic properties • ferroelectric properties • piezoelectric properties • dielectric properties Powered by hundreds of millions of CPU-hours invested into high- quality calculations
  • 14. The core data set keeps growing with time … 14
  • 15. Apps give insight into data Materials Explorer Phase Stability Diagrams Pourbaix Diagrams (Aqueous Stability) Battery Explorer 15
  • 16. The code powering the Materials Project is available open source (BSD/MIT licenses) just-in-time error correction, fixing your calculations so you don’t have to ‘recipes' for common materials science simulation tasks making materials science web apps easy workflow management software for high-throughput computing materials science analysis code: make, transform and analyze crystals, phase diagrams and more & more … MP team members also contribue to several other non-MP codes, e.g. matminer for machine learning featurization 16
  • 17. The Materials Project is used heavily by the research community > 180,000 registered users > 40,000 new users last year ~100 new registrations/day ~10,000 users log on every day > 2M+ records downloaded through API each day; 1.8 TB of data served per month 17
  • 18. Today, the Materials Project has led to many examples of “computer to lab” success stories MP for p-type transparent conductors References ✦ Hautier, G., Miglio,A., Ceder, G., Rignanese, G.-M. & Gonze, X. Identification and design principles of low hole effective mass p-type transparent conducting oxides. Nature Communications 4, (2013) ✦ Bhatia,A. et al. High-Mobility Bismuth-based Transparent p-Type Oxide from High- Throughput Material Screening. Chemistry of Materials 28, 30–34 (2015) ✦ Ricci, F. et al.An ab initio electronic transport database for inorganic materials. Scientific Data 4, (2017) Prediction Screening based on band gap, transport properties and band alignments. Experiment Predictions revealed material with s–p hybridized valence band (thought to correlate well with dopability). When synthesized, material has excellent transparency and readily dopable with K. Ba2BiTaO6 MP for thermoelectrics References ✦ Aydemir, U. et al.YCuTe2: a member of a new class of thermoelectric materials with CuTe4-based layered structure. Journal of Materials Chemistry A 4, 2461–2472 (2016) ✦ Zhu, H. et al. Computational and experimental investigation of TmAgTe2and XYZ2compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening. Journal of Materials Chemistry C 3, 10554–10565 (2015). ✦ Pöhls, J.-H. et al. Metal phosphides as potential thermoelectric materials. Journal of Materials Chemistry C 5, 12441–12456 (2017). Prediction Screening of tens of thousands of materials with predicted electron transport properties revealed a family of promising XYZ2 candidates Experiment Several materials made: YCuTe2 (zT = 0.75), TmAgTe2 (zT = 0.47, 1.8 theoretical), novel NiP2 phosphide TmAgTe2 MP for phosphors References ✦ Wang, Z. et al. Mining Unexplored Chemistries for Phosphors for High-Color- Quality White-Light-Emitting Diodes. Joule 2, 914–926 (2018) ✦ Li, S. et al. Data-Driven Discovery of Full-Visible-Spectrum Phosphor. Chemistry of Materials 31, 6286–6294 (2019) ✦ Ha, J. et al. Color tunable single-phase Eu2+ and Ce3+ co-activated Sr2LiAlO4 phosphors. Journal of Materials Chemistry C 7, 7734–7744 (2019) Prediction Statistical analysis of existing materials that co-occur with word ‘phosphor’ followed by structure prediction for new materials Experiment Predicted first known Sr-Li- Al-N quaternary, showed green-yellow/blue emission with quantum efficiency of 25% (Eu), 40% (Ce), 55% (co-activated Eu, Ce) Sr2LiAlN4 ≈ç ≈ 18
  • 19. One of the applications we looked into was thermoelectric materials 19 • A thermoelectric material generates a voltage based on thermal gradient • Applications • Heat to electricity • Refrigeration • Advantages include: • Reliability • Easy to scale to different sizes (including compact) www.alphabetenergy.com
  • 20. It is difficult to balance trade-offs in thermoelectrics properties, so use screening 20 ZT = α2σT/κ power factor >2 mW/mK2 (PbTe=10 mW/mK2) Seebeck coefficient > 100 V/K Band structure + Boltztrap electrical conductivity > 103 /(ohm-cm) Band structure + Boltztrap thermal conductivity < 1 W/(m*K) • e from Boltztrap • l difficult (phonon-phonon scattering) Heavy band: ü Large DOS (higher Seebeck and more carriers) ✗ Large effective mass (poor mobility) Light band: ü Small effective mass (improved mobility) ✗ Small DOS (lower Seebeck, fewer carriers) Multiple bands, off symmetry: ü Large DOS with small effective mass ✗ Difficult to design! E k ~50,000 crystal structures and band structures from Materials Project are used as a source F. Ricci, et al., An ab initio electronic transport database for inorganic materials, Sci. Data. 4 (2017) 170085. We compute electronic transport properties with BoltzTraP and minimum thermal conductivity (Cahill- Pohl) for some compounds About 300GB of electronic transport data is generated. All data is available free for download.
  • 21. We found several compounds with promising figure-of-merit, but no breakthroughs 21 • Calculations: trigonal p- TmAgTe2 could have power factor up to 8 mW/mK2 • requires 1020/cm3 carriers experiment computation • Calculations: p-YCuTe2 could only reach PF of 0.4 mW/mK2 • SOC inhibits PF • if thermal conductivity is low (e.g., 0.4, we get zT ~1) • Expt: zT ~0.75 – not too far from calculation limit • carrier concentration of 1019 • Decent performance, but unlikely to be improved with further optimization • Expt: p-zT only 0.35 despite very low thermal conductivity (~0.25 W/mK) • Limitation: carrier concentration (~1017/cm3) • likely limited by TmAg defects, as determined by followup calculations • Later, we achieved zT ~ 0.47 using Zn-doping TmAgTe2 YCuTe2
  • 22. Outline • Introduction to group and overview of our projects • The Materials Project and virtual materials design • The Matbench protocol: benchmarking ML algorithms • Natural language processing applied to materials design • Automating materials synthesis and characterization 22
  • 23. There are many new algorithms being published for ML in materials – New ones constantly reported! 23
  • 24. But it is very difficult to compare algorithms 24 Data set used in study A Data set used in study B Data set used in study C • Different data sets • Source (e.g., OQMD vs MP vs JARVIS) • Quantity (e.g., MP 2019 vs MP 2022) • Subset / data filtering (e.g., ehull<X) • Different evaluation metrics • Test set vs. cross validation? • Different test set fraction? • Can be difficult to install and retrain many of these algorithms MAE 5-Fold CV = 0.102 eV RMSE Test set = 0.098 eV vs. ? ?
  • 25. Can we design a standard test set for ML algorithms for materials science? 25 • There is no single type of problem that materials scientists are trying to solve • For now, focus on materials property prediction (from structure or composition) • We want a test set that contains a diverse array of problems • Smaller data versus larger data • Different applications (electronic, mechanical, etc.) • Composition-only or structure information available • Experimental vs. Ab-initio • Classification or regression
  • 26. Matbench includes 13 different ML tasks 26 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
  • 27. Models tested by Matbench to date Model Representation type Representation summary Magpie + Sine Coulomb Matrix + Random Forest Composition or Structure Hand-created chemical features coupled with random forest ML algorithm Automatminer Composition or Structure Hand-created chemical features with genetic algorithm based ML algorithm and hyperparameter selection MODNET Composition or Structure Hand-created chemical features with various neural network layers CGCNN Structure only Graph convolution based neural networks with basic initial atom/bond features ALIGNN Structure only Graph based convolutional networks based on bonds/angles in addition to atoms/bonds CRABNet Composition only Transformer-based self-attention for composition; initialized using NLP-based embeddings 27
  • 28. How to read the Matbench leaderboard 28 Bigger datasets Better relative performance • A scaled error of 0.0 means all predictions are correct • A scaled error of 1.0 is equal to always predicting the average value
  • 29. Magpie + SCF Model • Composition features using chemical descriptors such as averages/stdevs of elemental properties such as melting point, electronegativity • Structure features using sine Coulomb matrix 29 Ward, L., Agrawal, A., Choudhary, A. et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput Mater 2, 16028 (2016). Faber, Felix, et al. "Crystal structure representations for machine learning models of formation energies." International Journal of Quantum Chemistry 115.16 (2015): 1094-1101. https://matbench.materialsproject.org
  • 30. Automatminer Model 30 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://matbench.materialsproject.org
  • 31. MODNet Model 31 De Breuck, P.-P.; Evans, M. L.; Rignanese, G.-M. Robust Model Benchmarking and Bias-Imbalance in Data-Driven Materials Science: A Case Study on MODNet. Journal of Physics: Condensed Matter, Volume 33, Number 40, 2021 https://matbench.materialsproject.org
  • 32. CGCNN Model 32 Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120 (14), 145301. https://matbench.materialsproject.org
  • 33. ALIGNN Model 33 Choudhary, Kamal, and Brian DeCost. "Atomistic Line Graph Neural Network for improved materials property predictions." npj Computational Materials 7.1 (2021): 1-8. https://matbench.materialsproject.org
  • 34. How much have we improved overall? 34 • In some cases (e.g., Ef DFT) we have made a lot of improvement • In contrast, for others (e.g., σy steel alloys) we have barely improved • Possible reasons • Amount of attention paid to certain problems • Small vs large data emphasis – there is a lot more room for improvement for small data
  • 35. How could we improve Matbench? • Additional tasks – but how to keep it manageable? • Adding external conditions (temperature, reducing gas presence, microstructural characterizations) • Other materials classes (polymers, metal alloys, multi-material composites) • Other types of properties (e.g., predicting spectra) • More dynamic tests, e.g. update the test periodically and re-evaluate • Other scoring metrics • e.g., active learning searches • cross-validation by leaving out chemical systems rather than random splits 35
  • 36. Outline • Introduction to group and overview of our projects • The Materials Project and virtual materials design • The Matbench protocol: benchmarking ML algorithms • Natural language processing applied to materials design • Automating materials synthesis and characterization 36
  • 37. Literature data can be a key source of materials learning 37 Plan Synthesize Characterize Analyze local db + ML Automated Lab A Plan Synthesize Characterize Analyze Conventional Lab B Plan Synthesize Characterize Analyze local db + ML Automated Lab C Literature data + broad coverage – difficult to parse – lack negative examples Other A-lab data + structured data formats + negative examples – not much out there … Theory data + readily available – difficult to establish relevance to synthesis
  • 38. The NLP Solution to Literature Data • A lot of prior experimental data already exists in the literature that would take untold costs and labor to replicate again • Advantages to this data set are broad coverage of materials and techniques • Disadvantages include: • Getting access to the data • lack of negative examples in the data • missing / unreliable information • difficulty to obtain structured data from unstructured text • Natural language processing can help with the last part, although considerable difficulties are still involved • Named entity recognition • Identify precursors, amounts, characteristics, etc. • Relationship modeling • Relate the extracted entities to one another
  • 39. Previous approach for extracting data from text 39 Weston, L. et al Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature. J. Chem. Inf. Model. (2019) Recently, we also tried BERT variants Trewartha, A.; Walker, N.; Huo, H.; Lee, S.; Cruse, K.; Dagdelen, J.; Dunn, A.; Persson, K. A.; Ceder, G.; Jain, A. Quantifying the Advantage of Domain-Specific Pre-Training on Named Entity Recognition Tasks in Materials Science. Patterns 2022, 3 (4), 100488.
  • 40. Models were good for labeling entities, but didn’t understand relationships 40 Named Entity Recognition • Custom machine learning models to extract the most valuable materials-related information. • Utilizes a long short-term memory (LSTM) network trained on ~1000 hand-annotated abstracts. Trewartha, A.; Walker, N.; Huo, H.; Lee, S.; Cruse, K.; Dagdelen, J.; Dunn, A.; Persson, K. A.; Ceder, G.; Jain, A. Quantifying the Advantage of Domain-Specific Pre-Training on Named Entity Recognition Tasks in Materials Science. Patterns 2022, 3 (4), 100488.
  • 41. A Sequence-to-Sequence Approach • Language model takes a sequence of tokens as input and outputs a sequence of tokens • Maximizes the likelihood of the output conditioned on the input • Additionally includes task conditioning • Capacity for “understanding” language as well as “world knowledge” • Task conditioning with arbitrary Seq2Seq provides extremely flexible framework • Large seq2seq2 models can generate text that naturally completes a paragraph
  • 42. How a sequence-to-sequence approach works 42 Seq2Seq model (GPT3) Text in (“prompt”) Text out (“completion”)
  • 43. Another example 43 Seq2Seq model (GPT3) Text in (“prompt”) Text out (“completion”)
  • 44. Structured data 44 Seq2Seq model (GPT3) Text in (“prompt”) Text out (“completion”)
  • 45. But it’s not perfect for technical data 45 Seq2Seq model (GPT3) Text in (“prompt”) Text out (“completion”)
  • 46. A workflow for fine-tuning GPT-3 1. Initial training set of templates filled mostly manually, as zero- shot GPT is often poor for technical tasks 2. Fine-tune model to fill templates, use the model to assist in annotation 3. Repeat as necessary until desired inference accuracy is achieved
  • 47. Templated extraction of synthesis recipes • Annotate paragraphs to output structured recipe templates • JSON-format • Designed using domain knowledge from experimentalists • Template is relation graph to be filled in by model • Note: we are still formally evaluating performance • various issues in getting an accurate evaluation, e.g., predictions that are functionally correct but written differently
  • 49. Applied to solid state synthesis / doping We have performed the first-principles calculations onto the structural, electronic and magnetic properties of seven 3d transition-metal (TM=V, Cr, Mn, Fe, Co, Ni and Cu) atom substituting cation Zn in both zigzag (10,0) and armchair (6,6) zinc oxide nanotubes (ZnONTs). The results show that there exists a structural distortion around 3d TM impurities with respect to the pristine ZnONTs. The magnetic moment increases for V-, Cr-doped ZnONTs and reaches maximum for Mn-doped ZnONTs, and then decreases for Fe-, Co- , Ni- and Cu-doped ZnONTs successively, which is consistent with the predicted trend of Hund’s rule for maximizing the magnetic moments of the doped TM ions. However, the values of the magnetic moments are smaller than the predicted values of Hund’s rule due to strong hybridization between p orbitals of the nearest neighbor O atoms of ZnONTs and d orbitals of the TM atoms. Furthermore, the Mn-, Fe-, Co-, Cu-doped (10,0) and (6,6) ZnONTs with half-metal and thus 100% spin polarization characters seem to be good candidates for spintronic applications.
  • 50. Use in initial hypothesis generation 50 classifying AuNP morphologies based on precursors used Predicting new materials for functional applications predicting doping – if a material can be doped with A, can it be doped with B? Investigated as thermoelectrics (independently of our study) Investigated by our own collaborators (as a result of our study) (done using an older method)
  • 51. Outline • Introduction to group and overview of our projects • The Materials Project and virtual materials design • The Matbench protocol: benchmarking ML algorithms • Natural language processing applied to materials design • Automating materials synthesis and characterization 51
  • 52. Developing an automated lab (“A-lab”) that makes use of literature data is in progress 52 Plan Synthesize Characterize Analyze local db + ML Automated Lab A Plan Synthesize Characterize Analyze Conventional Lab B Plan Synthesize Characterize Analyze local db + ML Automated Lab C Literature data + broad coverage – difficult to parse – lack negative examples Other A-lab data + structured data formats + negative examples – not much out there … Theory data + readily available – difficult to establish relevance to synthesis
  • 53. The A-lab facility is designed to handle inorganic powders 53 In operation: XRD Robot Box furnaces Setting up: Tube furnace x 4 LBNL bldg. 30 Dosing and mixing Facility will handle powder- based synthesis of inorganic materials, with automated characterization and experimental planning Collaboration w/ G. Ceder & H. Kim July 2022 - Tube furnaces and SEM ready Hardware development Platform Integration Automated Synthesis AI-guided Synthesis April 2022 Box furnace, XRD, & robots ready November 2022 - Powder dosing system - First automated syntheses Summer 2023 AI-guided synthesis Closed- Loop Materials Discovery Summer 2024 Closed-loop materials discovery
  • 54. Lab starting to take shape … 54 Courtesy Y. Fei, Ceder Group The embedded video shows a robotic arm performing various synthesis tasks, such as loading a box furnace and performing multiple steps needed to prepare and load an XRD sample. Other videos (not shown here) show ball milling, interaction with tube furnaces. A powder doser is expected to arrive in 1-2 months.
  • 55. The continuing challenge – putting it all together! Currently we are still working on various components Historical-data Initial hypotheses data-api NLP and literature data ML algorithms High-throughput DFT data
  • 56. Acknowledgements NLP • Nick Walker • John Dagdelen • Alex Dunn • Sanghoon Lee • Amalie Trewartha 56 A-lab • Rishi Kumar • Yuxing Fei • Haegyum Kim • Gerbrand Ceder Funding provided by: • U.S. Department of Energy, Basic Energy Science, “D2S2” program • Toyota Research Institutes, Accelerated Materials Design program • Lawrence Berkeley National Laboratory “LDRD” program Slides (will be) posted to hackingmaterials.lbl.gov Materials Project • Kristin Persson • Matthew Horton • All MP collaborators, too many to name …