Materials Data in Action

Materials Data in ActionMaterials Data in Action

Max Hutchinson,Max Hutchinson,
Scientific Software Eng.
ONE DOES NOT SIMPLY...ONE DOES NOT SIMPLY...
APPLY OFF THE SHELF MLAPPLY OFF THE SHELF ML
TOOLS TO MATERIALSTOOLS TO MATERIALS
DISCOVERYDISCOVERY
ARTIFICIAL INTELLIGENCE FOR MAT. SCI.ARTIFICIAL INTELLIGENCE FOR MAT. SCI.
8 AUGUST 2018, NIST8 AUGUST 2018, NIST
Bryce Meredig,Bryce Meredig,
Chief Science Officer

What is materials informatics?What is materials informatics?

What makes it particularly challenging?What makes it particularly challenging?

Can we do anything about it?Can we do anything about it?

LET'S TRY TO MACHINELET'S TRY TO MACHINE
LEARN A NOBEL PRIZELEARN A NOBEL PRIZE

CASE STUDY: HIGH-T SUPERCONDUCTORSCASE STUDY: HIGH-T SUPERCONDUCTORS
Pia Jensen Ray. Figure 2.4 in Master's thesis, "Structural investigation of La(2-x)Sr(x)CuO(4+y) - Following staging as a function of temperature". Niels Bohr Institute, Faculty of Science,
University of Copenhagen. Copenhagen, Denmark, November 2015. DOI:10.6084/m9.figshare.2075680.v2

Cross-validated RMSE for T ≈c 10K

CAN WE PREDICT HIGH-TCAN WE PREDICT HIGH-T
SUPERCONDUCTIVITY?!?SUPERCONDUCTIVITY?!?
(spoiler alert:) no

LEAVE ONE CLUSTER OUT (LOCO) CVLEAVE ONE CLUSTER OUT (LOCO) CV
Nominal k-fold cross validations assumes independence of samples from the input space
This is almost never true in materials informatics: individual data sources have
selection biases and different data sources draw from different distributions
LOCO CV groups the data before computing train/test splits
The groups are inferred via clustering rather than being dictated by a domain expert
"Can machine learning identify the next high-temperature superconductor? Examining
extrapolation performance for materials discovery."
B. Meredig, ..., M. Hutchinson, ..., B. Gibbons, J. Hattrick-Simpers, A. Mehta, L. Ward

The model can't "extrapolate" across material classes (clusters).

LOW CROSS-VALIDATIONLOW CROSS-VALIDATION
ERROR IS INSUFFICIENTERROR IS INSUFFICIENT

PossiblePossible
MaterialsMaterials
InformaticsInformatics
ResearchResearch
ProgramProgram
1. Collect data
2. Train an approximate ML model
3. Validate the ML model
If insufficiently accurate, back to (1)
4. Optimize or screen over materials using the ML model
5. ...
6. Profit
A large portion of the literature focuses on collection, training,
and validation in support of screening.

CAN WE DISCOVER NEWCAN WE DISCOVER NEW
MATERIALS?MATERIALS?
(spoiler alert): yes

DESIGN OFDESIGN OF
EXPERIMENTS,EXPERIMENTS,
SEQUENTIALSEQUENTIAL
LEARNING,LEARNING,
AND "FUELS"AND "FUELS"
1. Collect data
2. Train an approximate ML model
3. Design an experiment
4. Conduct the experiment
If quality is insufficient, append and back to (2)
5. ...
6. Profit
Modeling Experiment
Designs
Informs

DESIGNING THE NEXT EXPERIMENTDESIGNING THE NEXT EXPERIMENT
Maximum Expected


Maximum Likelihood of
Improvement (MLI)

Maximum Uncertainty
x ∗ p(x; θ) dx∫−∞
∞
[ ]
p(x; θ) dx∫α
∞
[ ]
(x − ) dx∫−∞
∞
[ xˉ 2
]

BENCHMARK: DESIGN ON EXPLICIT LISTBENCHMARK: DESIGN ON EXPLICIT LIST
9x
2x

REAL WORLD EXAMPLE: ADAPT @ MINESREAL WORLD EXAMPLE: ADAPT @ MINES
https://www.additivemanufacturing.media/articles/how-machine-learning-is-moving-am-beyond-trial-and-error/

DATA DRIVEN MODELINGDATA DRIVEN MODELING
DELIVERS DISCOVERYDELIVERS DISCOVERY
FASTERFASTER

WHAT ABOUT THEWHAT ABOUT THE
MACHINE LEARNING?MACHINE LEARNING?

““SimplySimply downloading and ‘applying’downloading and ‘applying’
open-source software to your dataopen-source software to your data
won’t work. AI needs to be customizedwon’t work. AI needs to be customized
to your business context and data.”to your business context and data.”

Andrew Ng in Harvard Business Review
(Stanford, Google Brain, Coursera, Baidu)

MATERIALS INFORMATICS CONTEXTMATERIALS INFORMATICS CONTEXT
Labels are scarce and expensive
Typical dataset sizes are 100-1000 labels
Preparing a sample is often more difficult than measuring it
Different labels have low marginal costs
We've been doing physics, chemistry, and materials science for hundreds of years
There are (not always accurate) sources of computational data
We have some priors for which labels are related
We have some priors for what some relationships look like

PHYSICAL RELATIONSHIPS PHYSICAL RELATIONSHIPS
Materials science has Process-
Structure-Property (PSP) relationships
Process Structure Property
Structure
Properties
Performance
Processing
Characterization

PHYSICAL RELATIONSHIPS PHYSICAL RELATIONSHIPS
Physics, mathematics, and engineering
think about multi-scale modeling
Micro Meso Macro
https://www.nas.nasa.gov/SC14/demos/demo26.html

GRAPHICAL MODELS: DOMAIN-AWARE MODELINGGRAPHICAL MODELS: DOMAIN-AWARE MODELING
Inputs & Features
Featurization
Empirical Relation
Computational Data
Machine Learning
Quantity of Interest

GRAPHICAL MODELS: TRANSFER LEARNINGGRAPHICAL MODELS: TRANSFER LEARNING
M. Hutchinson, E. Antono, B. Gibbons, S. Paradiso, J. Ling, B. Meredig
Overcoming data scarcity with transfer learning, https://arxiv.org/pdf/1711.05099.pdf
"B" is a plentiful latent variable
DFT band gap
Hydrogen splitting react. rate
Indentation hardness

"A" is a scarce or expensive label
Color
NO splitting reaction rate
Ultimate tensile strength

GRAPHICAL MODELS: TRANSFER LEARNINGGRAPHICAL MODELS: TRANSFER LEARNING
Simple example:
Adding yield strength
information to a fatigue
strength design increases
experimental efficiency
M. Hutchinson, E. Antono, B. Gibbons, S. Paradiso, J. Ling, B. Meredig
Overcoming data scarcity with transfer learning, https://arxiv.org/pdf/1711.05099.pdf

WHERE DOES THE UNCERTAINTY COME FROM?WHERE DOES THE UNCERTAINTY COME FROM?
Jackknife methods capture uncertainty with respect to finite sample size.
Computational cost is independent of the size of the feature space.
We add an explicit bias term trained on the out-of-bag errors

WHERE DOES THE UNCERTAINTY COME FROM?WHERE DOES THE UNCERTAINTY COME FROM?

(PROBABALISTIC) GRAPHICAL MODELS(PROBABALISTIC) GRAPHICAL MODELS
Inputs & Features
Featurization
Empirical Relation
Computational Data
Machine Learning
Quantity of Interest

THANKTHANK
YOU!YOU! Job listings: citrine.io/jobs
Newsletter:
citrination.org/publications_talks/ddms-newsletter/
Literature review:
citrination.org/learn/citrines-literature-review/

Materials Data in Action

More Related Content

Similar to Materials Data in Action

More from aimsnist

Recently uploaded

Materials Data in Action