Hattrick Simpers TMS Machine Learning Workshop Slides

The MGI & Data-driven High-
Throughput Synthesis and
Characterization
Brian DeCost, Zachary Trautt, Martin Green, Gilad Kusne,
Jason Hattrick-Simpers
NIST Gaithersburg
Jason.Hattrick-Simpers@nist.gov
@jae3goals
Any mention of commercial products within this talk is for information only; it does not imply recommendation or
endorsement by NIST.

Outline
• The Materials Genome Initiative (MGI) and NIST’s Role
• The High-Throughput Experimental Materials Collaboratory (HTE-MC)
• Accelerated Discovery of (High – Hardness & Corrosion Resistant)
Metallic Glasses
• Iterative HTE and AI
• Vision for the Future
• Look Ma No Hands (Experimentation)!!
• Conclusions

Decrease time-to-market by 50% while <<$$
• Develop a Materials Innovation
Infrastructure
• Achieve National goals in energy,
security, and human welfare with
advanced materials
• Equip the next generation of
materials workforce
Materials Genome Initiative for
Global Competitiveness

The Materials Genome Initiative

Apple Watch – Announced September
2014

Examples of Cultural Implementation and
Successes of the MGI
• Argonne Collaboration – phase identification at aluminum interfaces
• Lund Boats – MGI on the plant floor
• Casting Simulation (MAGMA) – MGI in R&D, tool shop, & plant floor
• Timken Steel – Premium Air Melt Practice, putting premium quality,
cost conscious steel into the hands of our customers
• BASF – Foaming simulations based on first principles
• ERCo – Laser Induced Breakdown Spectroscopy for real-time melt
composition (ARPA-E)

Standards Are Important
• The NIST MGI Program is taking a very careful approach to consensus
standards for data representation
• There is a long track record of failure for most of the space
• Exception for high structured data (e.g. ICSD)
• This should be done top-down not bottoms-up

MGI Directions to Date
Materials by Design
projects:
DOE EFRCs, EMNs
NSF DMREFs
HT computational
databases:
Need: High-throughput
experimental data

Workshop: “Fulfilling the Promise of the Materials
Genome Initiative via High-Throughput
Experimentation” – 2014

Workshop Conclusions
A large portion of the MGI program thus far has been devoted to modeling
and simulation. Prodigious amounts of experimental data will be required to
inform and validate modeling and simulation, to “power the MGI
computational engine.”
 HTE can rapidly establish relationships between composition, structure,
and properties for a wide variety of materials classes, and therefore is:
a) uniquely suited to rapidly generate high quality, consistent data
sets
b) the key enabling counterpart to modeling and simulation for
bringing the MGI to fruition
 “Enable broad access to HTE methodologies and data”

High Throughput Experimental Materials
Collaboratory (HTE-MC)
• Necessary because even on “brick and mortar” HTE facility would be
very costly, and multiple facilities dedicated to different materials
classes (e.g. catatlysts, photovoltaics, lightweight structural materials,
etc.) are needed
• Enable researchers at national laboratories, universities, and industry
to have access to HTE facilities
• The HTE-MC would facilitate MGI-driven research while leveraging
investment
• Complement new science investments (EMN’s, NNMI, MURI, etc)

How?
• Collaboratory: a 1989 neologism (William A. Wulf, Computer Scientist
at University of Virginia):
“defined by… a center without walls, ‘in which the nation’s
researchers can perform their research without regard to physical
locations, interacting with colleaues, accessing instrumentation,
sharing data and computational resources, … accessing information in
digital librarires
• A HTE-MC would consist of:
• An integrated, delocalized network of high-throughput synthesis and
characterization tools
• A best-in-class materials data management platform, consisting of NIST (and
other) software

HTE-MC 1st Steps: NIST – NREL Round Robin
Sample synthesis and measurements:
• Synthesize: Zn-Sn-Ti-O composition spread
sample libraries using combinatorial PLD
(@NIST) or sputtering (@NREL)
• Measure: Chemical composition, Crystal
structure, Electrical conductivity, Optical
transmittance, Band gap
• Exchange: Sample libraries and associated
data, repeat measurements
Zn-Sn-Ti-O:
• Chemical composition
• Crystal structure
• Electrical conductivity
• Optical transmittance
• Work function
Goal: test and improve the standards for exchange of data and sample among participant labs
NREL Samples NIST Sample

Addressing FAIR Principles
To be Findable:
• (meta)data are assigned a globally unique and
persistent identifier
• data are described with rich metadata
• metadata clearly and explicitly include the identifier
of the data it describes
• (meta)data are registered or indexed in a searchable
resource
To be Accessible:
• (meta)data are retrievable by their identifier using a
standardized communications protocol
– the protocol is open, free, and universally
implementable
– the protocol allows for an authentication and
authorization procedure, where necessary
• metadata are accessible, even when the data are no
longer available
To be Interoperable:
• (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
• (meta)data use vocabularies that follow FAIR
principles
• (meta)data include qualified references to other
(meta)data
To be Reusable:
• meta(data) are richly described with a plurality of
accurate and relevant attributes
– (meta)data are released with a clear and accessible data
usage license
– (meta)data are associated with detailed provenance
– (meta)data meet domain-relevant community standards
Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data
management and stewardship." Scientific data 3 (2016). DOI:
10.1038/sdata.2016.18

HTE-MCGOVERNMENT
AGENCIES
MEMBERS
• Academia
• National Labs
• Industry
• Small Business
Provide
Students/Staff
Receive
Funding $Provide Structural
Funding
Provide Science
Infrastructure
USERS
• Industry
• Small Business
• Academia
• National Labs
• Manufacturing
USA Institutes
• Energy Materials
Networks
Pay Tiered
Access Fees
$
$
Generate
New Data
CONTRIBUTORS
• Academia
• National Labs
• HTE-MC Users
(after embargo period)
Receive
Benefits
Publish Open-
Access Data
VISITORS / PUBLIC
• Industry
• Small Business
• Academia
• Educators
• National Labs
• Manufacturing
USA Institutes
• Energy Materials
Networks
Access AI-ready
Public Data
Next Generation
Workforce
New
Knowledge
Materials
Solutions
+1
Provide Data
Infrastructure

HTE Materials Collaboratory
Problems
• Experimental databases
are not keeping pace with
computational databases
• HTE is out of reach to most
due to high startup and
operating costs
• Materials are diverse; no
single institution can have
all the necessary
equipment
Solution
• Integrate HTE laboratories
with materials
cyberinfrastructure
• HTE as a shared resource;
operate on demand by
access fees and core
funding
• HTE as a federated
resource; enable
connectivity via
cyberinfrastructure

• Member
• Provides infrastructure
• User
• Utilizes infrastructure
• Creates new data
• May choose to
publish data
• Contributor
• Publishes data
• Visitor
• Consumes public data
Technical Stakeholder Types and Population
Visitors
Contributors
Users
Members
(defines action, not access)

The Collaborative Economy
HTE-MC

HTE-MC
Member Institute
Laboratory Information
Management System
Data Transfer Grid
Instruments/Computing
Database / Structured
Data / Metadata
File/Collection Repository
Member Institute
Management System
Data Dissemination
Data Transfer Grid
Data / Metadata
Registries
Materials
Resource Registry
High-Throughput
Experiment
Resource Registry
Member Institute
User Institute
Data Transfer Grid
Management System
Data Transfer Grid
Data / Metadata
Data Transfer Grid
Data / Metadata

High-Throughput Experimental Materials
Collaboratory (HTE-MC) Workshop
• Held: February 2018
• Workshop Goals:
• Socialize the HTE-MC concept among government, academic and industry stakeholders
• Expand HTE-MC membership
• Define technical, operational and business models for the HTE-MC
• Facilitated Breakout Sessions:
• Define the Vision of HTE-MC
• Define the value proposition for participation
• Identify major barriers to successful participation
• Identify and prioritize pilot use cases
• Identify and describe modes of interaction of users
• Define governance and business models for HTE-MC
• Workshop Report: In preparation

A Multi-Agency, Multi-Year Program Plan in
Advanced Energy Materials Discovery,
Development, and Process Design
• Held July 2018
• Workshop Goals
• Determine how best to coordinate next steps within the Federal Government
• Efficiently leverage the ongoing research in advanced materials conducted in
academia, industry, and government research laboratories
• Facilitated Breakout Sessions:
• Priorities in Energy Materials R&D: Barriers, Timeline, and Metrics
• Database infrastructure needs in AI and Energy Materials R&D: Moving Materials
Discovery through Materials Processes
• Expansion of the Collaboratory Network for Energy Materials Discovery and Process
Design
• Integration of AI, ML, and Experimentation for Energy Materials Design and
Processing
• Workshop Report: In preparation

Iterative Machine Learning – High
Throughput Experimental Approach to
Discovering Novel Amorphous Alloys
Fang Ren1, Logan Ward2, Travis Williams3, Kevin J. Laws4,
Christopher M. Wolverton2, Jason Hattrick-Simpers5, Apurva Mehta1
1SLAC National Accelerator Laboratory, 2Northwestern University, 3University of South Carolina,
4UNSW Australia, 5National Institute of Standards and Technology, 6 University of Chicago
Science Advances, Vol 4 No. 4 (2018)

Lightweight Structural Materials
http://corporate.exxonmobil.com/en/energy/research-and-development/innovating-
energy-solutions/research-and-development-highlights
Wall Street Journal via Google Images

Metallic Glasses Are Interesting
http://vitreloy.caltech.edu/development.htm
West US 7998286 B2
E Ma. Nature Materials. 14, 2015.
Metallic glass (MG) is a solid
metallic material, usually an
alloy, with a disordered atomic-
scale structure (amorphous).

The Palette of Potential Metallic Glasses
Usually Contain 3 or more elements
30 non-toxic, earth friendly elements  > 4000 ternaries, > 4 Million compositions

Building the Machine Learning Model
Ref: Ward et al. npj Comp. Mater. (2016), 28.
Experimental
Data
Machine Learning
Algorithm
Composition-based
Representation
𝜎𝑟 < 1.1 Å
MG Not MG
𝜇 𝑍 ΔΧ
𝜎 𝑇 𝑚 max 𝑟𝑐𝑜𝑣
𝑥 𝐻, 𝑥 𝐻𝑒, … 2
𝑮𝑭𝑨 = 𝒇(𝒙 𝑯, 𝒙 𝑯𝒆, … )
24 Million Ternary Alloys
74520 potential MGs
5739 measurements
145 Attributes
Random Forest

Select Experiments that Involve Contradiction
Selection Criteria
1.) None of the models 100% disagree
2.) Some experimental data existed
3.) Inexpensive, low vapor pressure materials
Yang Model
Efficient
Packing Model ML Predictions

(Split) Model Predictions
Melt Spun Predictions Sputtered Predictions

“Fail Fast” via HTE
Sample Position
Deposition
Ratio
Deposited Sample
Gun 1
Binary Deposition
Gun 2
2D XRD
Detector
Fluorescence
Detector
Temperature > 1200K,
~ 5000 patterns/day

Negative Results >> Positive Results

We Can Rebuild It, We Have the Technology

Are There Any Interesting Generalizations?

Case Example X-Y-Al: Breaking from
Convention AND Property Prediction
No “deep” eutectics necessary!
Massalski “Binary Alloy Phase Diagrams” (1990)

But How to Create Property Models?
• There is no L-B-type data set for
properties of MG
• NLP/data extraction from
figures is in its infancy
• Manually scrape the literature
• 2000+ entries
• Errant measurements
• Many different groups
• Inconsistent definition of
“amorphous”
Feature Importance
Average Ground State
Volume
0.37
Minimum Ground State
Volume
0.24
Minimum Covalent Radius
0.12
Mean Melting
Temperature
0.036
Highest Melting
Temperature
0.017

Ternary Modulus Predictions
0
50
100
150
200
250
300
0 50 100 150 200 250 300
PredictedModulus(GPa)
Measured Elastic Modulus (GPa)

Experimental Validation of Prediction
Er

Can A.I./M.L. Lead to Autonomous Materials
Discovery?

“In the next 5 years, AI-driven, autonomous
materials research is going to fundamentally
change how we do materials science.”
-Jim Warren, Technical Program Director for
Materials Genomics, NIST

Autonomous Research is Already Here
AFRL - ARES Hein - UBC NIST - UMD

Autonomous REsearch Systems (ARES)

Gilad’s Automated Experimentation Platform
Fe
Fe0.4Pd0.6
Fe0.4Ga0.6
Kusne, et. al., to be submitted

Active clustering for autonomous XRD phase
mapping
Think carefully about modeling to remove researcher degrees of freedom
DeCost, et. al., to be submitted

Conclusions
• AI & ML are already prevalent in the design of new materials, materials
synthesis, data capture/cleaning and knowledge extraction
• Neither AI nor ML are a panacea that will replace human intuition and
creativity, they are enablers
• In some cases an order of magnitude increase in materials
exploration/discovery is possible
• Maybe a fairer metric of AI’s influence will be on the rate of hypothesis
generation and (in)validation
• AI needs FAIR data including negative results to be effective
• Not part of the solution = consigned to obscurity
• Full materials research autonomy (for specific problems) has already been
demonstrated

Acknowledgements
USC
Travis Williams
SLAC
Dr. Apurva Mehta
Dr. Fang Ren
Dr. Suchismita
Northwestern
Prof. Wolverton
Dr. Logan Ward
UNSW
Prof. Kevin Laws
NIST
Dr. James Warren
Dr. Martin Green
Dr. Zachary Trautt
Dr. Gilad Kusne
Dr. Brian DeCost
Mr. Ryan Smith
NREL
Dr. Andriy
Zakutayev
CSM
Prof. Packard
Dr. Schoeppner

Demonstrations and Talks by (confirmed speakers):
• Theory
• Computational Approaches
• Experimental Approaches
Andrew Millis (Columbia)
Antoine Georges (CCQ)
Karin Rabe (Rutgers)
Bootcamp: Machine Learning for Materials Research &
Workshop: Machine Learning Quantum Materials
• Dates: July 30 – Aug 3, 2018
• Location: IBBR (Gaithersburg, Maryland)
MLMR Introduces researchers from industry, national labs, and academia to machine learning theory and tools for rapid data analysis.
https://nanocenter.umd.edu/events/mlmr/
Bootcamp
Three days of lectures and hands-on exercises covering a range of
data analysis topics from data pre-processing through advanced
machine learning analysis techniques. Example topics include:
• Identifying important features in complex/high dimensional
data
• Visualizing high dimensional data to facilitate user analysis.
• Identifying the fabrication ‘descriptors’ that best predict
variance in functional properties.
• Quantifying similarities between materials using complex/high
dimensional data
The hands-on exercises will demonstrate practical use of machine
learning tools on real materials data (scalar values, spectra,
micrographs, etc.
Sasha Balatsky (LANL)
Roger Melko (Waterloo)
Shoucheng Zhang (Stanford)
Stefano Curtarolo (Duke)
Gus Hart (BYU)
Ichiro Takeuchi (UMD)
Sergei Kalinin (ORNL)
Benji Maruyama (AFRL)
Jiun-Haw Chu (Univ. Washington)
Giuseppe Carleo (Flatiron)
Miles Soudenmire (Flatiron)

Hattrick Simpers TMS Machine Learning Workshop Slides

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hattrick Simpers TMS Machine Learning Workshop Slides

Similar to Hattrick Simpers TMS Machine Learning Workshop Slides (20)

Recently uploaded

Recently uploaded (20)

Hattrick Simpers TMS Machine Learning Workshop Slides

Editor's Notes