Perspectives on chemical composition and crystal structure representations from the Matbench test set

Perspectives on chemical composition
and crystal structure representations
from the Matbench test set
Anubhav Jain
Lawrence Berkeley National Laboratory
TRI Spring Meeting, May 2022
Slides (already) posted to hackingmaterials.lbl.gov

Outline
1. Introduction to the Matbench testing protocol
2. What does Matbench tell us about current ML
models for materials property prediction?
3. How should testing protocols like Matbench be
further improved?
2

ML is quickly becoming a standard tool for
materials screening
3
Machine learning
High-throughput DFT
Expensive calculation
Experiment
Millions of candidates

There are many new algorithms being published
for ML in materials –
New ones constantly reported!
4

But it is very difficult to compare
algorithms
5
Data set used
in study A
Data set used
in study B
Data set used
in study C
• Different data sets
• Source (e.g., OQMD vs MP vs JARVIS)
• Quantity (e.g., MP 2019 vs MP 2022)
• Subset / data filtering (e.g., ehull<X)
• Different evaluation metrics
• Test set vs. cross validation?
• Different test set fraction?
• Can be difficult to install and retrain
many of these algorithms
MAE 5-Fold CV = 0.102 eV
RMSE Test set = 0.098 eV
vs.
? ?

What’s needed – an “ImageNet” for materials
science
6
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/

What does a standard
data set do for a field?
7
One of the reasons computer science
/ machine learning seems to advance
so quickly is that they decouple data
generation from algorithm
development
This allows groups to focus on
algorithm development without all
the data generation, data cleaning,
etc. that often is the majority of an
end-to-end data science project

How to design good data sets for materials
science?
8
• There is no single type of problem that materials scientists are trying
to solve
• For now, focus on materials property prediction (from structure or
composition)
• We want a test set that contains a diverse array of problems
• Smaller data versus larger data
• Different applications (electronic, mechanical, etc.)
• Composition-only or structure information available
• Experimental vs. Ab-initio
• Classification or regression

Matbench includes 13 different ML tasks
9
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference
Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.

The tasks encompass a variety of problems
10
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference
Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.

The Matbench web site is live
https://matbench.materialsproject.org

How to read the Matbench leaderboard
12
Bigger datasets
Better
relative
performance
• A scaled error of 0.0 means all
predictions are correct
• A scaled error of 1.0 is equal
to always predicting the
average value

Access to Datasets/ML tasks
Interactively, via Materials Project
ml.materialsproject.org
Programmatically via matbench in python (2 lines)
*loads all 13 tasks
Programmatically via matminer in python (2 lines) Direct download: matbench.materialsproject.org
Preferred/easiest method!
https://github.com/hackingmaterials/matminer
https://github.com/hackingmaterials/matminer

Programmatic Access and Analysis of Submissions
14
• Run a benchmark on your own algorithm in ~10 lines of code
• Run on any combination or all of the 13 existing tasks
• If your entry outperforms existing entry, submit algorithm in a pull request!
Existing notebooks/code and
software requirements for
reproducing any benchmark
{'python': [['crabnet==1.2.1',
'scikit_learn==1.0.2', 'matbench==0.5']]}
Comprehensive raw data
(accessible via matbench python
package or any json-capable
language) on all benchmarks
Publicly available to anyone!
In-depth performance metrics for
individual ML tasks for all
submissions
Both visually on website, and
programmatically
matbench.materialsproject.org

Outline
further improved?
15

Models tested by Matbench to date
Model Representation type Representation summary
Magpie + Sine Coulomb
Matrix + Random Forest
Composition
or Structure
Hand-created chemical features coupled with random
forest ML algorithm
Automatminer Composition
or Structure
Hand-created chemical features with genetic algorithm
based ML algorithm and hyperparameter selection
MODNET Composition
or Structure
Hand-created chemical features with various neural
network layers
CGCNN Structure only Graph convolution based neural networks with basic
initial atom/bond features
ALIGNN Structure only Graph based convolutional networks based on
bonds/angles in addition to atoms/bonds
CRABNet Composition only Transformer-based self-attention for composition;
initialized using NLP-based embeddings
16

Magpie + SCF Model
• Composition features using
chemical descriptors such as
averages/stdevs of elemental
properties such as melting
point, electronegativity
• Structure features using sine
Coulomb matrix
17
Ward, L., Agrawal, A., Choudhary, A. et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput Mater 2, 16028 (2016).
Faber, Felix, et al. "Crystal structure representations for machine learning models of formation energies." International Journal of Quantum Chemistry 115.16 (2015): 1094-1101.

Automatminer Model
18
Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput
Mater 2020, 6 (1), 138.

MODNet Model
19
De Breuck, P.-P.; Evans, M. L.; Rignanese, G.-M. Robust Model Benchmarking and Bias-Imbalance in Data-Driven Materials Science: A Case Study on MODNet. Journal of Physics:
Condensed Matter, Volume 33, Number 40, 2021

CGCNN Model
20
Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120 (14), 145301.

ALIGNN Model
21
Choudhary, Kamal, and Brian DeCost. "Atomistic Line Graph Neural Network for improved materials property predictions." npj Computational Materials 7.1 (2021): 1-8.

CRABNet Model
22
Wang, A.; Kauwe, S.; Murdock, R.; Sparks, T. Compositionally-Restricted Attention-Based Network for Materials Property Prediction; npj Computational Materials, 7, 77, 2021

How much have we improved overall?
23

Outline
further improved?
24

An even greater diversity of tasks
• Despite already having 13 tasks, many common materials tasks are largely untested:
• data sets with additional conditions such as:
• Temperature
• doping concentration
• geometry (e.g., nanoparticle size)
• polymer materials
• alloys
• multi-material composites
• spectral properties
• Unfortunately, more scores can be confusing. Even the current 13 tasks feels like a bit too
much. So we can’t just keep adding a ton of tests.
• More community discussion here is needed …
25

Data splitting
• Matbench used a nested cross-validation procedure with random (but
consistent) data splits. This leads to reproducibility.
• Bad news:
• Randomized splitting means we are often not effectively testing extrapolation
• e.g., the same composition may be in training and test sets (but different polymorphs)
• may want to move to a GroupKFold where groups are separated by formula, chemical
system, or by unsupervised clustering (last one is proposed as part of LOCO-CV1)
• Data “memorization” – with unchanging data and splits, it’s possible to design
algorithms engineered to beat the Matbench tests
• Leads to benchmark “saturation” – the benchmarks are solved
• Need a “Dynabench” (Facebook) type strategy where new adversarial examples are
frequently rotated into the test set
26
1. Meredig, B.; Antono, E.; Church, C.; Hutchinson, M.; Ling, J.; Paradiso, S.; Blaiszik, B.; Foster, I.; Gibbons, B.; Hattrick-Simpers, J.; Mehta, A.; Ward, L. Can Machine Learning Identify
the next High-Temperature Superconductor? Examining Extrapolation Performance for Materials Discovery. Molecular Systems Design & Engineering 2018, 3 (5), 819–825.
https://doi.org/10.1039/C8ME00012C.

Active learning scoring
• Machine learning algorithms are not always deployed as “one-shot”
procedures (given a data set, produce the best model)
• Rather, many industrial and research deployments of ML allow for “active
learning” – i.e., the algorithm decides what data points to collect next
• It can select points aimed at reducing its own prediction uncertainty, or it can select
points that maximize probability of finding a good material
• This opens up many new research avenues such as generating uncertainty
estimates and making optimal use of them in real-world problems
27

Conclusions and future
• As the community increasingly develops new algorithms for machine
learning materials properties, a standard way to test these algorithms
is needed
• Matbench represents such a standard
• Matbench also allows us to measure overall progress in the field
• We hope to see you on the leaderboard!
28

Acknowledgements
29
Alex Dunn
Lead developer
Qi Wang
Alex Ganose Daniel Dopp
Slides (already) posted to hackingmaterials.lbl.gov
Patrick Huck

Perspectives on chemical composition and crystal structure representations from the Matbench test set

Recommended

Recommended

More Related Content

Similar to Perspectives on chemical composition and crystal structure representations from the Matbench test set

Similar to Perspectives on chemical composition and crystal structure representations from the Matbench test set (20)

More from Anubhav Jain

More from Anubhav Jain (20)

Recently uploaded

Recently uploaded (20)

Perspectives on chemical composition and crystal structure representations from the Matbench test set