SlideShare a Scribd company logo
Machine learning for materials design:
opportunities, challenges, and methods
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Energy Probe workshop, May 13, 2019
• Batteries
– stable and high-energy electrodes, solid state
electrolytes
• Thermal energy storage & conversion
– High zT thermoelectrics, high heat capacity liquids
• Photovoltaics
– Improved efficiency of absorber, reduced degradation
in coatings, controlling ion migration in front glass,
lifetime of organic / hybrid materials
2
Almost every technology could be improved with better
materials!
• Often, materials are
known for several decades
before their functional
applications are known
– MgB2 sitting on lab shelves
for 50 years before its
identification as a
superconductor in 2001
• Even after discovery,
optimization and
commercialization still
take decades
3
Typically, both new materials discovery and optimization
take decades
Materials data from: Eagar T., King M. Technology
Review 1995
4
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
• Experiments are generally time-consuming and
labor-intensive
– Days to months to get measurements with large
investment of researcher time
– Not too long ago, one essentially needed to do
everything experimentally
5
ML surrogates for experiments and computation:
background
• Computations can be faster and require less
researcher time
– Today, some materials design problems can be
modeled in the computer[1]
– But, CPU-time is still a major issue
6
ML surrogates for experiments and computation:
background
[1] Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional
theory. Nature Reviews Materials 1, 15004 (2016).
• Machine learning can be the fastest of all and
could play a major role in supporting experiments
and computation, e.g. to identify the most
promising regions of chemical space prior to
even computation / theory
7
ML surrogates for experiments and computation:
background
8
Example application: machine learning as a surrogate for
DFT computations
1. S. Smith, J., Isayev, O. & E. Roitberg, A. ANI-1: an extensible neural network potential with DFT accuracy at force field
computational cost. Chemical Science 8, 3192–3203 (2017).
2. Aspuru-Guzik, A., & Persson, K. Materials Acceleration Platform—Accelerating Advanced Energy Materials Discovery by
Integrating High-Throughput Methods with Artificial Intelligence.
The ML model can be 5-6 orders of magnitude faster!
Potential to run ~1 million tests for the price of 1
9
Example from our group: developing and testing surrogate
models over diverse materials data problems
(paper in preparation)
10
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
• Typically, the choice of what materials to
perform experiments on (or to compute) is
chosen by the researcher
• Advantage: takes advantage of domain expertise
of researcher (potentially decades of knowledge)
• Potential issues:
– Bias (exploring near already known systems)
– Time (takes time to think of what to study)
11
“Self-driving” laboratories: background
• In a “self-driving” laboratory,
an algorithm chooses the
next
experiment/computation
and performs it
automatically
• “Active learning” ML
• At each stage, the algorithm
balances exploration and
exploitation
12
“Self-driving” laboratories: background
Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and
suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
13
Example application: shape-memory allows with low
transition temperature and hysteresis
Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and
suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
Using an adaptive design strategy, one can reduce
the number of measurements needed to find all
Pareto-optimal shape memory alloys
14
Example from our group: Rocketsled for automated
computational searches
Rocketsled can help find optimal solutions using much
fewer computations overall (less CPU) and parallelized
over supercomputers (less time)
Dunn, A., Brenneck, J. & Jain, A. Rocketsled: a software library for optimizing high-throughput computational
searches. J. Phys. Mater. 2, 034002 (2019).
15
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
• Most materials science data and knowledge only
exists in unstructured format (e.g., as text in
journal publications)
• Can we make use of knowledge in text format?
16
Natural language processing: background
17
Example: synthesis planning based on text mining
1.
1. Kim, E. et al. Data Descriptor : Machine-learned and codified synthesis parameters of oxide materials. Scientific Data 1–9 (2017).
2. Kim, E. et al. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. Chemistry of
Materials acs.chemmater.7b03500-acs.chemmater.7b03500 (2017).
18
Example from our group: using NN to predict “gaps” in
materials discoveries
Using word2vec on a database of 3 million materials science
abstracts, we can predict which words should co-occur with
one another.
This can be used to predict materials that should be studied
for functional applications (“gaps” in the research literature)
Tshitoyan V., Dagdelen J., Weston L., Dunn A., Rong Z., Kononova O., Persson K., Ceder G., Jain A. Unsupervised word
embeddings capture latent knowledge from materials science literature. Accepted / in press, Nature
• Data availability
– Typical materials data sets range from ~dozen
examples to a few thousand; rare to have 100,000
data points
– No standard data sets to build models on (e.g.
ImageNet)
19
Challenges
• Data Heterogeneity
– There is no single data type (e.g., image data, spectral
data, graph data)
– Different materials problems have their own data
types and often ones unknown in computer science
(e.g., periodic crystal structures)
20
Challenges
• ML model Extrapolation
– Almost all industry ML focuses on interpolation-type
problems (data on almost all representative examples
is in place)
– Materials science requires extrapolation of very
complex physics
– Standard cross-validation likely insufficient (e.g.,
cluster-based cross-validation better?)
– ML interpretability would build confidence in
extrapolation
21
Challenges
• Kristin Persson (ESDR) – materials databases, ML
• Shyam Dwaraknath (ESDR) –ML for characterization
• Juli Mueller (CRD) – active learning
• Dani Ushizima (CRD) – classifying materials image data
• Tess Smidt (CRD) – crystal structure models for ML
• Emory Chan (MSD) – automated experiments
• Colin Ophus (MSD) – TEM image labeling
• Gerbrand Ceder (MSD) – text mining / NLP of synthesis
22
Some relevant groups at LBNL

More Related Content

What's hot

Two-Dimensional Layered Materials for Battery Application--Yifei Li
Two-Dimensional Layered Materials for Battery Application--Yifei LiTwo-Dimensional Layered Materials for Battery Application--Yifei Li
Two-Dimensional Layered Materials for Battery Application--Yifei Li
Yifei Li
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
KAMAL CHOUDHARY
 
Applications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NREL
aimsnist
 

What's hot (20)

Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
 
Two-Dimensional Layered Materials for Battery Application--Yifei Li
Two-Dimensional Layered Materials for Battery Application--Yifei LiTwo-Dimensional Layered Materials for Battery Application--Yifei Li
Two-Dimensional Layered Materials for Battery Application--Yifei Li
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Computational methods applied to materials modeling
Computational methods applied to materials modelingComputational methods applied to materials modeling
Computational methods applied to materials modeling
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
Applications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NRELApplications of Machine Learning for Materials Discovery at NREL
Applications of Machine Learning for Materials Discovery at NREL
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...
 
Engineering Applications of Machine Learning
Engineering Applications of Machine LearningEngineering Applications of Machine Learning
Engineering Applications of Machine Learning
 
Materials for hydrogen storage
Materials for hydrogen storageMaterials for hydrogen storage
Materials for hydrogen storage
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
2 d materials
2 d materials2 d materials
2 d materials
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Graphene ppt
Graphene pptGraphene ppt
Graphene ppt
 
A Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge SystemsA Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge Systems
 

Similar to Machine learning for materials design: opportunities, challenges, and methods

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 

Similar to Machine learning for materials design: opportunities, challenges, and methods (20)

Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...
 
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
SamNola_July2016-s
SamNola_July2016-sSamNola_July2016-s
SamNola_July2016-s
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 

More from Anubhav Jain

More from Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 

Recently uploaded

Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
Sérgio Sacani
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Sérgio Sacani
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 

Recently uploaded (20)

SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...Climate extremes likely to drive land mammal extinction during next supercont...
Climate extremes likely to drive land mammal extinction during next supercont...
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
electrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptxelectrochemical gas sensors and their uses.pptx
electrochemical gas sensors and their uses.pptx
 
NuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent UniversityNuGOweek 2024 full programme - hosted by Ghent University
NuGOweek 2024 full programme - hosted by Ghent University
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
FAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS imagesFAIRSpectra - Towards a common data file format for SIMS images
FAIRSpectra - Towards a common data file format for SIMS images
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
Topography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of BengalTopography and sediments of the floor of the Bay of Bengal
Topography and sediments of the floor of the Bay of Bengal
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
Gliese 12 b: A Temperate Earth-sized Planet at 12 pc Ideal for Atmospheric Tr...
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
Transport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSETransport in plants G1.pptx Cambridge IGCSE
Transport in plants G1.pptx Cambridge IGCSE
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 

Machine learning for materials design: opportunities, challenges, and methods

  • 1. Machine learning for materials design: opportunities, challenges, and methods Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Energy Probe workshop, May 13, 2019
  • 2. • Batteries – stable and high-energy electrodes, solid state electrolytes • Thermal energy storage & conversion – High zT thermoelectrics, high heat capacity liquids • Photovoltaics – Improved efficiency of absorber, reduced degradation in coatings, controlling ion migration in front glass, lifetime of organic / hybrid materials 2 Almost every technology could be improved with better materials!
  • 3. • Often, materials are known for several decades before their functional applications are known – MgB2 sitting on lab shelves for 50 years before its identification as a superconductor in 2001 • Even after discovery, optimization and commercialization still take decades 3 Typically, both new materials discovery and optimization take decades Materials data from: Eagar T., King M. Technology Review 1995
  • 4. 4 Some opportunities for accelerating materials design using machine learning techniques Accelerated materials design ML surrogates for expt / comp. “Self-driving laboratories” Opportunities in natural language processing
  • 5. • Experiments are generally time-consuming and labor-intensive – Days to months to get measurements with large investment of researcher time – Not too long ago, one essentially needed to do everything experimentally 5 ML surrogates for experiments and computation: background
  • 6. • Computations can be faster and require less researcher time – Today, some materials design problems can be modeled in the computer[1] – But, CPU-time is still a major issue 6 ML surrogates for experiments and computation: background [1] Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional theory. Nature Reviews Materials 1, 15004 (2016).
  • 7. • Machine learning can be the fastest of all and could play a major role in supporting experiments and computation, e.g. to identify the most promising regions of chemical space prior to even computation / theory 7 ML surrogates for experiments and computation: background
  • 8. 8 Example application: machine learning as a surrogate for DFT computations 1. S. Smith, J., Isayev, O. & E. Roitberg, A. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chemical Science 8, 3192–3203 (2017). 2. Aspuru-Guzik, A., & Persson, K. Materials Acceleration Platform—Accelerating Advanced Energy Materials Discovery by Integrating High-Throughput Methods with Artificial Intelligence. The ML model can be 5-6 orders of magnitude faster! Potential to run ~1 million tests for the price of 1
  • 9. 9 Example from our group: developing and testing surrogate models over diverse materials data problems (paper in preparation)
  • 10. 10 Some opportunities for accelerating materials design using machine learning techniques Accelerated materials design ML surrogates for expt / comp. “Self-driving laboratories” Opportunities in natural language processing
  • 11. • Typically, the choice of what materials to perform experiments on (or to compute) is chosen by the researcher • Advantage: takes advantage of domain expertise of researcher (potentially decades of knowledge) • Potential issues: – Bias (exploring near already known systems) – Time (takes time to think of what to study) 11 “Self-driving” laboratories: background
  • 12. • In a “self-driving” laboratory, an algorithm chooses the next experiment/computation and performs it automatically • “Active learning” ML • At each stage, the algorithm balances exploration and exploitation 12 “Self-driving” laboratories: background Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
  • 13. 13 Example application: shape-memory allows with low transition temperature and hysteresis Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and suggestions for the future. Phys. Rev. Materials 2, 120301 (2018). Using an adaptive design strategy, one can reduce the number of measurements needed to find all Pareto-optimal shape memory alloys
  • 14. 14 Example from our group: Rocketsled for automated computational searches Rocketsled can help find optimal solutions using much fewer computations overall (less CPU) and parallelized over supercomputers (less time) Dunn, A., Brenneck, J. & Jain, A. Rocketsled: a software library for optimizing high-throughput computational searches. J. Phys. Mater. 2, 034002 (2019).
  • 15. 15 Some opportunities for accelerating materials design using machine learning techniques Accelerated materials design ML surrogates for expt / comp. “Self-driving laboratories” Opportunities in natural language processing
  • 16. • Most materials science data and knowledge only exists in unstructured format (e.g., as text in journal publications) • Can we make use of knowledge in text format? 16 Natural language processing: background
  • 17. 17 Example: synthesis planning based on text mining 1. 1. Kim, E. et al. Data Descriptor : Machine-learned and codified synthesis parameters of oxide materials. Scientific Data 1–9 (2017). 2. Kim, E. et al. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. Chemistry of Materials acs.chemmater.7b03500-acs.chemmater.7b03500 (2017).
  • 18. 18 Example from our group: using NN to predict “gaps” in materials discoveries Using word2vec on a database of 3 million materials science abstracts, we can predict which words should co-occur with one another. This can be used to predict materials that should be studied for functional applications (“gaps” in the research literature) Tshitoyan V., Dagdelen J., Weston L., Dunn A., Rong Z., Kononova O., Persson K., Ceder G., Jain A. Unsupervised word embeddings capture latent knowledge from materials science literature. Accepted / in press, Nature
  • 19. • Data availability – Typical materials data sets range from ~dozen examples to a few thousand; rare to have 100,000 data points – No standard data sets to build models on (e.g. ImageNet) 19 Challenges
  • 20. • Data Heterogeneity – There is no single data type (e.g., image data, spectral data, graph data) – Different materials problems have their own data types and often ones unknown in computer science (e.g., periodic crystal structures) 20 Challenges
  • 21. • ML model Extrapolation – Almost all industry ML focuses on interpolation-type problems (data on almost all representative examples is in place) – Materials science requires extrapolation of very complex physics – Standard cross-validation likely insufficient (e.g., cluster-based cross-validation better?) – ML interpretability would build confidence in extrapolation 21 Challenges
  • 22. • Kristin Persson (ESDR) – materials databases, ML • Shyam Dwaraknath (ESDR) –ML for characterization • Juli Mueller (CRD) – active learning • Dani Ushizima (CRD) – classifying materials image data • Tess Smidt (CRD) – crystal structure models for ML • Emory Chan (MSD) – automated experiments • Colin Ophus (MSD) – TEM image labeling • Gerbrand Ceder (MSD) – text mining / NLP of synthesis 22 Some relevant groups at LBNL