Available methods for predicting materials
synthesizability using computational and
machine learning approaches
Anubhav Jain
Lawrence Berkeley National Laboratory
TMS Spring Meeting, Mar 2023
Slides (already) posted to hackingmaterials.lbl.gov
Joining Gerd’s group ….
2
Joining Gerd’s group …
3
Gerd teaching thermodynamics, ~2006
Congratulations Gerd!
4
2008 ECS
2011
The OG Materials
Genome server at
MIT
2008 – BURP!
(“Bosch-Umicore
Research Project”)
Congratulations to G. Ceder!
5
2008
2011
The OG Materials
Project server
Outline of talk
• Congratulations to Gerd Ceder
• Old pictures
• The dreaded puppet
• Ceder group Jeopardy
6
GETTING WITH
THE PROGRAM
FONT
SIZES!
MEDITATING IN
QUIET SPACES
IS THIS A
GOVERNMENT
LAB?
POSTDOC OR
COMPUTER?
IS IT A HOLIDAY
TODAY?
Outline
• The explosion of new materials predictions, and dilemma of what to
test
• Can we trust ML algorithms that predict hull stability?
• Beyond 0K “e above hull”: Efficient phonons
• Integrating literature knowledge for efficient experimentation
8
The pace of experimental materials discovery
is about 10K-20K entries per year
9
Entries in the Powder Diffraction File (PDF)
Collaboration with ICSD
Collaboration with MPDS
~20,000 entries per year
over last decade
Gates-Rector, S. & Blanton, T. The Powder Diffraction File: a quality
materials characterization database. Powder Diffr. 34, 352–360 (2019).
Inorganic Crystal Structure Database
~10,000 entries per year
over last decade
Zagorac, D., Müller, H., Ruehl, S., Zagorac, J. & Rehme, S. Recent
developments in the Inorganic Crystal Structure Database:
theoretical crystal structure data and related features. J Appl
Crystallogr 52, 918–925 (2019).
However, machine learning is predicting very
large numbers of new stable compounds
10
With multiple experiments likely needed per
compound, even automated labs won’t be able
to keep up
“A-lab” – Ceder & colleagues
0
500000
1000000
1500000
2000000
MP
stable
ICSD PDF M3GNet
stable
In a short period of time, ML algorithms can
generate potentially millions of potentially
stable compounds
M3GNet data: Chen, C., Ong, S.P. A universal graph deep learning interatomic
potential for the periodic table. Nat Comput Sci 2, 718–728 (2022).
How do we prioritize?
11
We need to be able to accurately assess likelihood of synthesis
success to avoid wasting resources
Likelihood of success
Potential Novelty,
Functionality
candidate predictions * millions
Large error bars in the
process make this difficult
– how can we start working
towards more confidence
in likelihood of success?
Outline
• The explosion of new materials predictions, and dilemma of what to
test
• Can we trust ML algorithms that predict hull stability?
• Beyond 0K “e above hull”: Efficient phonons
• Integrating literature knowledge for efficient experimentation
12
Do ML algorithms work for new materials?
13
Bartel et al., npj Comp. Mats., 6, 97 (2020)
Structure relaxing algorithms: partially fix this by
solving an optimization problem
Zuo et al., MaterialsToday, 51 (2021)
ML algorithms should be tested on a discovery-
oriented task similar to how they’d be deployed
14
Matbench-discover asks algorithms to rank a new chemical space’s candidates by predicted
hull stability
MP
stable structures substituted, unrelaxed
candidate structures
(257k)
model: Wrenformer,
BOSWR, M3GNet,
etc.
Wang, H.-C., Botti, S. & Marques, M.
A. L. Predicting stable crystalline
compounds using chemical similarity.
npj Comput Mater 7, 12 (2021).
How well do current algorithms do? Good,
but still room for improvement
15
Precision (fraction of correctly guessed stable materials) peaks at about 0.5 or so
Discovery acceleration factor (how much better than random guess) peaks at almost 3 (maximum=6)
Surprisingly, MEGNET does better on these metrics than other models that relax the structure (although worse on MAE)
but this likely an artifact of the test (next slide).
M3Gnet is likely the best model of those tested (next slide)
Further analyzing the results and performance
curves, M3GNet is clearly the best model so far
16
Likely true negatives
Likely
true
positives
Likely false negatives Likely false positives
Matbench-discovery will be posted to Materials Project,
and we will track evolution of algorithms over time!
17
Data and granular metrics
2022 2021
2020
<2019
2022 2021
2020
<2019
Outline
• The explosion of new materials predictions, and dilemma of what to
test
• Can we trust ML algorithms that predict hull stability?
• Beyond 0K “e above hull”: Efficient phonons
• Integrating literature knowledge for efficient experimentation
18
“e above hull” is not particularly selective for
observed vs unobserved materials
19
Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for
synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018).
Sun, W. et al. The thermodynamic scale of inorganic crystalline metastability. Sci. Adv. 2,
e1600225 (2016).
Observed (blue) and unobserved (red) phases have significant
overlap in hull energies
The “best” hull energy cutoff to use is system-dependent, and
may not work well at all for some chemistries (e.g., nitrides)
We know what’s needed to go beyond the
hull, but it’s usually a pain and/or expensive
20
Dynamical
stability +
finite T Gibbs
energy
Oxidation &
moisture
resistance /
passivation
Aqueous /
environment
stability
Amorphous
hull limit /
ensure hull
is complete
Defect &
distortion
tolerance
0K hull
stability
starting structure
“synthesizability workflow”
Dynamical stability is not too hard to get, but
routinely ignored
21
Hull energies of some hypothetical MAX phases
Dynamically unstable phases are marked with a *
Khaledialidusti, R., Khazaei, M., Khazaei, S. & Ohno, K. Nanoscale
13, 7294–7307 (2021).
In more tricky cases, dynamic stability can be T-dependent
0K dynamic stability may not be enough
The vibrational contribution to free energy is
another thing we usually ignore
22
Bartel, C. J. et al. Nat Commun 9, 4168 (2018)
Calculating thermal properties of materials
• The vibrational thermal properties of materials are determined by
phonon behavior
• In lattice dynamics, we typically tailor expand the phonon interactions by
atomic displacements:
And differentiate to solve for the interatomic force constants
23
The problem – obtaining force constants can
require many DFT calculations
24
To obtain 2nd order IFCs
To obtain 3rd order IFCs
2 displacements in a supercell
(# of supercells needed: 1000s-10000s)
…
1 displacement in a supercell
(Usually <5 supercells needed)
Finite-displacement method IFCs extracted from HiPhive
To obtain any order of IFCs (2nd, 3rd,…) in one shot
…
displace each atom in a supercell
(Only need 5~10 supercells in total!)
• Traditionally, one performs systematic
displacements, each of which only has
a few atom movements and solves
only a small portion of the IFC matrix
• Primitive cells with reduced symmetry
and many atoms can easily require
1000 or more calculations
• The scaling goes something like:
O(Nn) where N is the number of sites
and n is the order of IFC you want. Not
scalable!
The solution – perform non-systematic
displacements • Instead of performing systematic
displacements, perform non-systematic
displacements in which many IFC terms are
“mixed up”
• Then, perform a best fit procedure to fit the IFC
matrix elements to the observed data
• Typically undetermined, so regularization is
important
• This method has been suggested by several
groups, for now we focus on the
implementation in the HiPhive code (Erhart
group, Chalmers University of Technology)
• Disadvantage: this method requires careful
selection of fit parameters to get correct results
25
IFCs extracted from HiPhive
To obtain any order of IFCs (2nd, 3rd,…) in one shot
…
displace each atom in a supercell
(Only need 5~10 supercells in total!)
Monte Carlo rattle penalizes displacements that lead to very small interatomic distances
Fransson, E.; Eriksson, F.; Erhart, P. Efficient Construction of Linear Models in
Materials Modeling and Applications to Force Constant Expansions. npj Comput
Mater 2020, 6 (1), 135.
We’ve been working to get the parameter
selection problem sorted out …
26
Effect of supercell size
Effect of cutoff
Effect of fitting method
Other parameters like rattling amount, etc. also tested
We’ve wrapped these and other considerations
into a fully automatic workflow
27
VASP
DFT relaxation
of primitive cell
VASP
SCF on supercells
(u = 0.01-0.05 Å)
VASP
SCF on supercells
(u = 0.1-0.5 Å)
HiPhive
Fit harmonic Φ2
HiPhive
Fit anharmonic
Φ3 ,Φ4 etc
Complete Φ
Imaginary
modes?
Stable Phonon
INPUT
Bulk modulus
ShengBTE/
FourPhonon
Boltzmann
Transport
• Free Energy
• Entropy
• Heat Capacity
• Gruneisen
• Thermal Expansion • Lattice Thermal
Conductivity
No
Yes
Inner Loop
Outer Loop
No
• Quantum Covariance
• Renormalize Φ2
Imaginary
modes?
Converged free
energy?
Free Energy
Converged free
energy?
• Expand Lattice at T
Yes
Yes
No
• Phase transition
• Thermoelectric zT
Renormalization at T ≥ 0 K
Renormalization
at T ≥ 0 K
Renormalized Φ
• Corrected
Free Energy
No
Yes
Non-analytical corrections for
ionic compounds
Phonon renormalization for imaginary
modes at finite T via Xia & Chan
Li (BCC, Im-3m) ZrO2 (cubic, Fm-3m)
GeTe (cubic, Fm-3m) BaTiO3 (cubic, Pm-3m)
Xia, Y. & Chan, M. K. Y. Anharmonic stabilization and lattice heat
transport in rocksalt β -GeTe. Appl. Phys. Lett. 113, 193902 (2018).
We see 100 – 1000X speedup compared to
finite displacement method
28
100x speedup
1000x speedup
harmonic terms (Φ2)
2nd order:
non-analytic correction (NAC)
phonon dispersion/DOS
quasi-harmonic thermal properties
(free energy, heat capacity, entropy)
anharmonic terms (Φ3, Φ4)
Φ
2
(
h
a
r
m
o
n
i
c
)
4th order:
finite-temperature phonon(renormalization)
corrected free energy
3rd order:
lattice thermal conductivity,
Gruneisen parameter,
coefficient of thermal expansion
More thermal properties
Higher physical accuracy
Computational feasibility
4 th
order
of IFCs
3 rd
order of
IFCs
2 nd
order of IFCs
…
φ
4
Φ3
(anharm
onic)
…
Dynamic stability, finite temperature
Gibbs free energy, and other parameters
are now accessible!
Hopefully such calculations can become more
routine in the future
29
Dynamical
stability +
finite T Gibbs
energy
Oxidation &
moisture
resistance /
passivation
Aqueous /
environment
stability
Amorphous
hull limit /
ensure hull
is complete
Defect &
distortion
tolerance
0K hull
stability
starting structure
Outline
• The explosion of new materials predictions, and dilemma of what to
test
• Can we trust ML algorithms that predict hull stability?
• Beyond 0K “e above hull”: Efficient phonons
• Integrating literature knowledge for efficient experimentation
30
Data from the literature is also used not only to
assess synthesizability, but make more efficient
use of experiments
This means not only identifying synthesizable compounds, but reducing
the number of experiments it takes to make them
31
Huo, H. et al. Machine-Learning Rationalization and Prediction of Solid-State Synthesis
Conditions. Chem. Mater. 34, 7323–7336 (2022).
A main issue is getting clean data sets
32
There is “loss” at
each step of the
process
Ideally, we have
fewer steps that
can do more, and
retain overall
accuracy
Wang et al., https://arxiv.org/abs/2111.10874
We have found that a fine-tuned GPT-3 model can
be used to extract synthesis recipes or other data
1. Initial training set of templates
filled mostly manually, as zero-
shot GPT is often poor for
complex technical tasks
2. Fine-tune model to fill
templates, use the model to
assist in annotation
3. Repeat as necessary until
desired inference accuracy is
achieved
Templated extraction of synthesis recipes
• Annotate paragraphs to output
structured recipe templates
• JSON-format
• Designed using domain knowledge
from experimentalists
• Template is relation graph to be
filled in by model
Example Extraction for Au nanorod synthesis
Training a decision tree to predict AuNR
shape shows similar conclusions as literature
36
Rod
Cube
Rod
Cube Bipyramid Star Bipyramid
None
None
None
None
None
None None
• Decision tree shows seed capping
agent type as first decision
boundary for shape determination
• “Citrate-capped gold seeds form
penta-twinned structure, while
CTAB-capped seeds are single
crystalline, hence former leads to
bipyramids and latter leads to
rods”1,2
1 Liu and Guyot-Sionnest, J.
Phys. Chem. B, 2005 109 (47),
22192-22200
2
Grzelczak et al., Chem. Soc.
Rev., 2008,37, 1783-1791
We see similar results in a parallel project
about BiFeO3 synthesis via sol–gel
37
We are also extending to applications like
doping
38
Currently:
~357,000
processed abstracts
~373,000 dopants in ~312,000 host materials
This allows us to get doping statistics for
common materials
39
We can see which materials might have
similar patterns of dopants …
40
Hosts
Dopant
s
Occurrences
(48k abstracts)
And use ML (collaborative filtering) to find
unknown dopants
41
Conclusions
• Even automated labs won’t be able to keep up with the deluge of ML
predictions of interesting / novel / functional compounds
• ML does seem to do a relatively good job at finding 0K hull stable
compounds
• We need more informed criteria on how to rank the candidates coming out
of the ML pipeline, and these criteria need to be easy to deploy
• More efficient calculation strategies and NLP-based ML can play a role to
help prioritize some compounds over others.
42
Acknowledgements
NLP
• Alex Dunn
• John Dagdelen
• Nick Walker
• Sanghoon Lee
• Kevin Cruse
• Viktoriia Baibakova
• Amalie Trewartha
43
Funding provided by:
• U.S. Department of Energy, Basic Energy Science, “Materials Project” program
• U.S. Department of Energy, Basic Energy Science, “D2S2” program
• Toyota Research Institutes, Accelerated Materials Design program
Slides (already) posted to hackingmaterials.lbl.gov
Matbench-discovery
• Janosh Riebesell
• Alex Dunn
• Rhys Goodall
Phonon workflow
• Zhuoying Zhu
• Hrushikesh
Sahasrabuddhe
… and of course Gerd for setting me on this
trajectory and continuing me on it …

Available methods for predicting materials synthesizability using computational and machine learning approaches

  • 1.
    Available methods forpredicting materials synthesizability using computational and machine learning approaches Anubhav Jain Lawrence Berkeley National Laboratory TMS Spring Meeting, Mar 2023 Slides (already) posted to hackingmaterials.lbl.gov
  • 2.
  • 3.
    Joining Gerd’s group… 3 Gerd teaching thermodynamics, ~2006
  • 4.
    Congratulations Gerd! 4 2008 ECS 2011 TheOG Materials Genome server at MIT 2008 – BURP! (“Bosch-Umicore Research Project”)
  • 5.
    Congratulations to G.Ceder! 5 2008 2011 The OG Materials Project server
  • 6.
    Outline of talk •Congratulations to Gerd Ceder • Old pictures • The dreaded puppet • Ceder group Jeopardy 6
  • 7.
    GETTING WITH THE PROGRAM FONT SIZES! MEDITATINGIN QUIET SPACES IS THIS A GOVERNMENT LAB? POSTDOC OR COMPUTER? IS IT A HOLIDAY TODAY?
  • 8.
    Outline • The explosionof new materials predictions, and dilemma of what to test • Can we trust ML algorithms that predict hull stability? • Beyond 0K “e above hull”: Efficient phonons • Integrating literature knowledge for efficient experimentation 8
  • 9.
    The pace ofexperimental materials discovery is about 10K-20K entries per year 9 Entries in the Powder Diffraction File (PDF) Collaboration with ICSD Collaboration with MPDS ~20,000 entries per year over last decade Gates-Rector, S. & Blanton, T. The Powder Diffraction File: a quality materials characterization database. Powder Diffr. 34, 352–360 (2019). Inorganic Crystal Structure Database ~10,000 entries per year over last decade Zagorac, D., Müller, H., Ruehl, S., Zagorac, J. & Rehme, S. Recent developments in the Inorganic Crystal Structure Database: theoretical crystal structure data and related features. J Appl Crystallogr 52, 918–925 (2019).
  • 10.
    However, machine learningis predicting very large numbers of new stable compounds 10 With multiple experiments likely needed per compound, even automated labs won’t be able to keep up “A-lab” – Ceder & colleagues 0 500000 1000000 1500000 2000000 MP stable ICSD PDF M3GNet stable In a short period of time, ML algorithms can generate potentially millions of potentially stable compounds M3GNet data: Chen, C., Ong, S.P. A universal graph deep learning interatomic potential for the periodic table. Nat Comput Sci 2, 718–728 (2022).
  • 11.
    How do weprioritize? 11 We need to be able to accurately assess likelihood of synthesis success to avoid wasting resources Likelihood of success Potential Novelty, Functionality candidate predictions * millions Large error bars in the process make this difficult – how can we start working towards more confidence in likelihood of success?
  • 12.
    Outline • The explosionof new materials predictions, and dilemma of what to test • Can we trust ML algorithms that predict hull stability? • Beyond 0K “e above hull”: Efficient phonons • Integrating literature knowledge for efficient experimentation 12
  • 13.
    Do ML algorithmswork for new materials? 13 Bartel et al., npj Comp. Mats., 6, 97 (2020) Structure relaxing algorithms: partially fix this by solving an optimization problem Zuo et al., MaterialsToday, 51 (2021)
  • 14.
    ML algorithms shouldbe tested on a discovery- oriented task similar to how they’d be deployed 14 Matbench-discover asks algorithms to rank a new chemical space’s candidates by predicted hull stability MP stable structures substituted, unrelaxed candidate structures (257k) model: Wrenformer, BOSWR, M3GNet, etc. Wang, H.-C., Botti, S. & Marques, M. A. L. Predicting stable crystalline compounds using chemical similarity. npj Comput Mater 7, 12 (2021).
  • 15.
    How well docurrent algorithms do? Good, but still room for improvement 15 Precision (fraction of correctly guessed stable materials) peaks at about 0.5 or so Discovery acceleration factor (how much better than random guess) peaks at almost 3 (maximum=6) Surprisingly, MEGNET does better on these metrics than other models that relax the structure (although worse on MAE) but this likely an artifact of the test (next slide). M3Gnet is likely the best model of those tested (next slide)
  • 16.
    Further analyzing theresults and performance curves, M3GNet is clearly the best model so far 16 Likely true negatives Likely true positives Likely false negatives Likely false positives
  • 17.
    Matbench-discovery will beposted to Materials Project, and we will track evolution of algorithms over time! 17 Data and granular metrics 2022 2021 2020 <2019 2022 2021 2020 <2019
  • 18.
    Outline • The explosionof new materials predictions, and dilemma of what to test • Can we trust ML algorithms that predict hull stability? • Beyond 0K “e above hull”: Efficient phonons • Integrating literature knowledge for efficient experimentation 18
  • 19.
    “e above hull”is not particularly selective for observed vs unobserved materials 19 Aykol, M., Dwaraknath, S. S., Sun, W. & Persson, K. A. Thermodynamic limit for synthesis of metastable inorganic materials. Sci. Adv. 4, eaaq0148 (2018). Sun, W. et al. The thermodynamic scale of inorganic crystalline metastability. Sci. Adv. 2, e1600225 (2016). Observed (blue) and unobserved (red) phases have significant overlap in hull energies The “best” hull energy cutoff to use is system-dependent, and may not work well at all for some chemistries (e.g., nitrides)
  • 20.
    We know what’sneeded to go beyond the hull, but it’s usually a pain and/or expensive 20 Dynamical stability + finite T Gibbs energy Oxidation & moisture resistance / passivation Aqueous / environment stability Amorphous hull limit / ensure hull is complete Defect & distortion tolerance 0K hull stability starting structure “synthesizability workflow”
  • 21.
    Dynamical stability isnot too hard to get, but routinely ignored 21 Hull energies of some hypothetical MAX phases Dynamically unstable phases are marked with a * Khaledialidusti, R., Khazaei, M., Khazaei, S. & Ohno, K. Nanoscale 13, 7294–7307 (2021). In more tricky cases, dynamic stability can be T-dependent 0K dynamic stability may not be enough
  • 22.
    The vibrational contributionto free energy is another thing we usually ignore 22 Bartel, C. J. et al. Nat Commun 9, 4168 (2018)
  • 23.
    Calculating thermal propertiesof materials • The vibrational thermal properties of materials are determined by phonon behavior • In lattice dynamics, we typically tailor expand the phonon interactions by atomic displacements: And differentiate to solve for the interatomic force constants 23
  • 24.
    The problem –obtaining force constants can require many DFT calculations 24 To obtain 2nd order IFCs To obtain 3rd order IFCs 2 displacements in a supercell (# of supercells needed: 1000s-10000s) … 1 displacement in a supercell (Usually <5 supercells needed) Finite-displacement method IFCs extracted from HiPhive To obtain any order of IFCs (2nd, 3rd,…) in one shot … displace each atom in a supercell (Only need 5~10 supercells in total!) • Traditionally, one performs systematic displacements, each of which only has a few atom movements and solves only a small portion of the IFC matrix • Primitive cells with reduced symmetry and many atoms can easily require 1000 or more calculations • The scaling goes something like: O(Nn) where N is the number of sites and n is the order of IFC you want. Not scalable!
  • 25.
    The solution –perform non-systematic displacements • Instead of performing systematic displacements, perform non-systematic displacements in which many IFC terms are “mixed up” • Then, perform a best fit procedure to fit the IFC matrix elements to the observed data • Typically undetermined, so regularization is important • This method has been suggested by several groups, for now we focus on the implementation in the HiPhive code (Erhart group, Chalmers University of Technology) • Disadvantage: this method requires careful selection of fit parameters to get correct results 25 IFCs extracted from HiPhive To obtain any order of IFCs (2nd, 3rd,…) in one shot … displace each atom in a supercell (Only need 5~10 supercells in total!) Monte Carlo rattle penalizes displacements that lead to very small interatomic distances Fransson, E.; Eriksson, F.; Erhart, P. Efficient Construction of Linear Models in Materials Modeling and Applications to Force Constant Expansions. npj Comput Mater 2020, 6 (1), 135.
  • 26.
    We’ve been workingto get the parameter selection problem sorted out … 26 Effect of supercell size Effect of cutoff Effect of fitting method Other parameters like rattling amount, etc. also tested
  • 27.
    We’ve wrapped theseand other considerations into a fully automatic workflow 27 VASP DFT relaxation of primitive cell VASP SCF on supercells (u = 0.01-0.05 Å) VASP SCF on supercells (u = 0.1-0.5 Å) HiPhive Fit harmonic Φ2 HiPhive Fit anharmonic Φ3 ,Φ4 etc Complete Φ Imaginary modes? Stable Phonon INPUT Bulk modulus ShengBTE/ FourPhonon Boltzmann Transport • Free Energy • Entropy • Heat Capacity • Gruneisen • Thermal Expansion • Lattice Thermal Conductivity No Yes Inner Loop Outer Loop No • Quantum Covariance • Renormalize Φ2 Imaginary modes? Converged free energy? Free Energy Converged free energy? • Expand Lattice at T Yes Yes No • Phase transition • Thermoelectric zT Renormalization at T ≥ 0 K Renormalization at T ≥ 0 K Renormalized Φ • Corrected Free Energy No Yes Non-analytical corrections for ionic compounds Phonon renormalization for imaginary modes at finite T via Xia & Chan Li (BCC, Im-3m) ZrO2 (cubic, Fm-3m) GeTe (cubic, Fm-3m) BaTiO3 (cubic, Pm-3m) Xia, Y. & Chan, M. K. Y. Anharmonic stabilization and lattice heat transport in rocksalt β -GeTe. Appl. Phys. Lett. 113, 193902 (2018).
  • 28.
    We see 100– 1000X speedup compared to finite displacement method 28 100x speedup 1000x speedup harmonic terms (Φ2) 2nd order: non-analytic correction (NAC) phonon dispersion/DOS quasi-harmonic thermal properties (free energy, heat capacity, entropy) anharmonic terms (Φ3, Φ4) Φ 2 ( h a r m o n i c ) 4th order: finite-temperature phonon(renormalization) corrected free energy 3rd order: lattice thermal conductivity, Gruneisen parameter, coefficient of thermal expansion More thermal properties Higher physical accuracy Computational feasibility 4 th order of IFCs 3 rd order of IFCs 2 nd order of IFCs … φ 4 Φ3 (anharm onic) … Dynamic stability, finite temperature Gibbs free energy, and other parameters are now accessible!
  • 29.
    Hopefully such calculationscan become more routine in the future 29 Dynamical stability + finite T Gibbs energy Oxidation & moisture resistance / passivation Aqueous / environment stability Amorphous hull limit / ensure hull is complete Defect & distortion tolerance 0K hull stability starting structure
  • 30.
    Outline • The explosionof new materials predictions, and dilemma of what to test • Can we trust ML algorithms that predict hull stability? • Beyond 0K “e above hull”: Efficient phonons • Integrating literature knowledge for efficient experimentation 30
  • 31.
    Data from theliterature is also used not only to assess synthesizability, but make more efficient use of experiments This means not only identifying synthesizable compounds, but reducing the number of experiments it takes to make them 31 Huo, H. et al. Machine-Learning Rationalization and Prediction of Solid-State Synthesis Conditions. Chem. Mater. 34, 7323–7336 (2022).
  • 32.
    A main issueis getting clean data sets 32 There is “loss” at each step of the process Ideally, we have fewer steps that can do more, and retain overall accuracy Wang et al., https://arxiv.org/abs/2111.10874
  • 33.
    We have foundthat a fine-tuned GPT-3 model can be used to extract synthesis recipes or other data 1. Initial training set of templates filled mostly manually, as zero- shot GPT is often poor for complex technical tasks 2. Fine-tune model to fill templates, use the model to assist in annotation 3. Repeat as necessary until desired inference accuracy is achieved
  • 34.
    Templated extraction ofsynthesis recipes • Annotate paragraphs to output structured recipe templates • JSON-format • Designed using domain knowledge from experimentalists • Template is relation graph to be filled in by model
  • 35.
    Example Extraction forAu nanorod synthesis
  • 36.
    Training a decisiontree to predict AuNR shape shows similar conclusions as literature 36 Rod Cube Rod Cube Bipyramid Star Bipyramid None None None None None None None • Decision tree shows seed capping agent type as first decision boundary for shape determination • “Citrate-capped gold seeds form penta-twinned structure, while CTAB-capped seeds are single crystalline, hence former leads to bipyramids and latter leads to rods”1,2 1 Liu and Guyot-Sionnest, J. Phys. Chem. B, 2005 109 (47), 22192-22200 2 Grzelczak et al., Chem. Soc. Rev., 2008,37, 1783-1791
  • 37.
    We see similarresults in a parallel project about BiFeO3 synthesis via sol–gel 37
  • 38.
    We are alsoextending to applications like doping 38 Currently: ~357,000 processed abstracts ~373,000 dopants in ~312,000 host materials
  • 39.
    This allows usto get doping statistics for common materials 39
  • 40.
    We can seewhich materials might have similar patterns of dopants … 40 Hosts Dopant s Occurrences (48k abstracts)
  • 41.
    And use ML(collaborative filtering) to find unknown dopants 41
  • 42.
    Conclusions • Even automatedlabs won’t be able to keep up with the deluge of ML predictions of interesting / novel / functional compounds • ML does seem to do a relatively good job at finding 0K hull stable compounds • We need more informed criteria on how to rank the candidates coming out of the ML pipeline, and these criteria need to be easy to deploy • More efficient calculation strategies and NLP-based ML can play a role to help prioritize some compounds over others. 42
  • 43.
    Acknowledgements NLP • Alex Dunn •John Dagdelen • Nick Walker • Sanghoon Lee • Kevin Cruse • Viktoriia Baibakova • Amalie Trewartha 43 Funding provided by: • U.S. Department of Energy, Basic Energy Science, “Materials Project” program • U.S. Department of Energy, Basic Energy Science, “D2S2” program • Toyota Research Institutes, Accelerated Materials Design program Slides (already) posted to hackingmaterials.lbl.gov Matbench-discovery • Janosh Riebesell • Alex Dunn • Rhys Goodall Phonon workflow • Zhuoying Zhu • Hrushikesh Sahasrabuddhe … and of course Gerd for setting me on this trajectory and continuing me on it …