SlideShare a Scribd company logo
Automated Machine Learning Applied to
Diverse Materials Design Problems
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
MRS Spring Meeting, 2019
Slides (already) posted to hackingmaterials.lbl.gov
2
There are many algorithms developed for machine learning
in materials – new ones are constantly reported!
3
Q: Which one is the “best” based
on all the literature reports?
4
Q: Which one is the “best” based
on all the literature reports?
A: Can’t tell! They are (almost?)
all tested on different data sets.
• Different data sets
– Source (e.g., OQMD vs MP)
– Quantity (e.g., MP 2018 vs MP 2019)
– Subset / data filtering (e.g., ehull<X)
• Different cross validation
metrics
– e.g., what fraction is test set?
• Often, this can’t be helped
– Usually can’t access training /
test data of past works
– Sometimes no runnable version
of a published algorithm
– should referees be tougher on this?
5
Difficulty of comparing different ML algorithms
Data set used
in study A
Data set used
in study B
Data set used
in study C
• Matbench: a standard test method for materials
science problems
– A set of diverse materials data sets for testing
– A consistent cross-validation strategy
• Automatminer: A “black box” materials science
ML algorithm
– Materials-specific descriptors using matminer
– AutoML to tune hyperparameters
6
Outline
• Matbench: a standard test method for materials
science problems
– A set of diverse materials data sets for testing
– A consistent cross-validation strategy
• Automatminer: A “black box” materials science
ML algorithm
– Materials-specific descriptors using matminer
– AutoML to tune hyperparameters
7
Outline
• We want a test set that contains a diverse array
of problems
– Smaller data versus larger data
– Different applications (electronic, mechanical, etc.)
– Composition-only or structure information available
– Classification or regression
• We also want a cross-validation metric that gives
reliable error estimates
– i.e., less dependent on specific choice of splits
8
A standard test method for ML algorithms in materials
9
Overview of Matbench test set
Target Property Data Source Samples Method
Bulk Modulus Materials Project 10,987 DFT-GGA
Shear Modulus Materials Project 10,987 DFT-GGA
Band Gap Materials Project 106,113 DFT-GGA
Metallicity Materials Project 106,113 DFT-GGA
Band Gap Zhuo et al. [1] 6,354 Experiment
Metallicity Zhuo et al. [1] 6,354 Experiment
Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment
Refractive index Materials Project 4,764 DFPT-GGA
Formation Energy Materials Project 132,752 DFT-GGA
Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA
Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA
Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF
Steel yield strength Citrine Informatics 312 Experiment
1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
<1K
1K-10K10K-100K
>100K
10
Diversity of benchmark suite
mechanical
electronic
stability
optical
thermal
classification
regression
experiment
(composition
only)
DFT
(structure)
application data size
problem
type
data type
• Matbench: a standard test method for materials
science problems
– A set of diverse materials data sets for testing
– A consistent cross-validation strategy
• Automatminer: A “black box” materials science
ML algorithm
– Materials-specific descriptors using matminer
– AutoML to tune hyperparameters
11
Outline
12
Most commonly used test procedure
• Training/validation
is used for model
selection
• Test / hold-out is
used only for error
estimation
(Test set should not
inform model
selection, i.e. “final
answer”)
Think of it as N different “universes” – we have a different
training of the model in each universe and a different hold-out.
13
Nested CV – like hold-out, but varies the hold-out set
Think of it as N different “universes” – we have a different
training of the model in each universe and a different hold-out.
14
Nested CV – like hold-out, but varies the hold-out set
“A nested CV procedure provides an almost unbiased estimate of the true error.”
Varma and Simon, Bias in error estimation when using cross-validation for model
selection (2006)
• Matbench is a curated set of data sets that provide a
diverse set of problems representative of those
found in materials science
• ML developers can work on a consistent set of test
problems
• Ideally – consistent reports of error in the literature!
• Matbench v1 will be released soon …
– Let us know if you have feedback / comments /
suggestions!
15
Summary of Matbench
• Matbench: a standard test method for materials
science problems
– A set of diverse materials data sets for testing
– A consistent cross-validation strategy
• Automatminer: A “black box” materials science
ML algorithm
– Materials-specific descriptors using matminer
– AutoML to tune hyperparameters
16
Outline
17
Typically several steps of machine learning are performed by
a human researcher – can these be automated?
Descriptors developed and
chosen by a researcher
ML model developed
and chosen by a
researcher
Why can’t we just give the computer some raw input data
(compositions, crystal structures) and output properties and get
back an ML model?
18
Automatminer is a ”black box” machine learning model
Give it any data set with either composition or structure inputs, and
automatminer will train an ML model (no researcher intervention)
19
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
20
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
21
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
22
The matminer library is available for open use
Ward et al. Matminer : An open
source toolkit for materials data
mining. Computational Materials
Science, 152, 60–69 (2018).
Paper Docs Support
hackingmaterials.github.io
/matminer
https://groups.google.com/
forum/#!forum/matminer
23
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
• TPOT uses genetic algorithms to determine
the best ML model and hyperparameters
using the training / validation set
– Also some internal feature reduction, scaling,
etc. – a full pipeline of operations
• Menu of ML options is all the algorithms
implemented in scikit-learn
– i.e., not neural networks
• Parameters include population size and
number of generations for genetic algorithm
– Tradeoff between CPU time and performance
– Auto-convergence or early stop possible
24
TPOT for AutoML
Olson, R. S. & Moore, J. H. TPOT: A Tree-based Pipeline Optimization Tool for
Automating Machine Learning. in Proceedings of the Workshop on Automatic Machine
Learning (eds. Hutter, F., Kotthoff, L. & Vanschoren, J.) 64, 66–74 (PMLR, 2016).
• Comparison 1: CGCNN
• Comparison 2: MEGNET
• Comparison 3: Untuned random forest (“no frills”)
– MAGPIE features for composition
– MAGPIE + Sine Coulomb matrix for structure
25
Comparing automatminer against state-of-the-art
Xie, T. & Grossman, J. C. Phys. Rev. Lett. 120, 145301 (2018).
Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P.
arXiv:1812.05055 (2018).
Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. npj
Computational Materials 2, 16028–16028 (2016).
Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. npj
Computational Materials 2, 16028–16028 (2016).
26
Matbench results for all algorithms
27
How does data set size affect performance?
For all structure-based regression problems, divide the mean absolute
error of model by mean absolute deviation of the data set.
• Always predicting the mean would yield a value of 1.0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
100 1000 10000 100000 1000000
MAE/MAD
Data Set Size
automatminer
CGCNN
MEGNET
• Automatminer is much faster / easier to train
– One can adjust training time of all algorithms to some
extent
– Note that MEGNET is faster than CGCNN but same
order of magnitude
• GPUs might greatly accelerate CGCNN /
MEGNET training (no timing available)
28
Algorithm training time per fold on 8-16 CPU cores
Data set size Automatminer CGNN MEGNET
~1K ~1 hour or less ~few hours ~few hours
~10K ~few hours ~few days ~few days
~100K ~12 hours ~few weeks ~few weeks
29
Getting started with automatminer
Paper Docs Support
hackingmaterials.github.io
/automatminer
https://groups.google.com/
forum/#!forum/matminer
In preparation …
• We proposed a diverse benchmark test suite of
problems to develop and test ML algorithms against
• We presented a black-box ML algorithm,
Automatminer, that performs comparably or
outperforms literature values on small data sets
(N<10,000), but does more poorly on larger data sets
• Further upgrades to automatminer are in progress!
– See if we can do better on N>10,000 problems
– Although crystal networks might alternately use transfer
learning to tackle N<10,000 problems (e.g., MEGNET)
30
Conclusions
31
Acknowledgements
Alex Dunn
Graduate student
Qi Wang
Postdoc
Alex Ganose
Postdoc
Alireza Faghaninia
Postdoc
Samy Cherfaoui
Undergraduate
Daniel Dopp
Undergraduate
Funding:
U.S. Department
of Energy, Basic
Energy Sciences
Slides (already) posted to
hackingmaterials.lbl.gov

More Related Content

What's hot

Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
Anubhav Jain
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
Anubhav Jain
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
Anubhav Jain
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
Anubhav Jain
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
Anubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Anubhav Jain
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Anubhav Jain
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Anubhav Jain
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
Anubhav Jain
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
Anubhav Jain
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Anubhav Jain
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
Anubhav Jain
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
Anubhav Jain
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
Anubhav Jain
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
Anubhav Jain
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
Anubhav Jain
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
Anubhav Jain
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
aimsnist
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
Anubhav Jain
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Anubhav Jain
 

What's hot (20)

Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
 

Similar to Automated Machine Learning Applied to Diverse Materials Design Problems

The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
Anubhav Jain
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Anubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
Anubhav Jain
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
KAMAL CHOUDHARY
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
Manuel Martín
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
Anubhav Jain
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Intel® Software
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
Sri Ambati
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
aimsnist
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
Francesca Lazzeri, PhD
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
Lionel Briand
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...
Deakin University
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
IRJET Journal
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
ArtemSunfun
 

Similar to Automated Machine Learning Applied to Diverse Materials Design Problems (20)

The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
The importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systemsThe importance of model fairness and interpretability in AI systems
The importance of model fairness and interpretability in AI systems
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
 

More from Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
Anubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
Anubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
Anubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
Anubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
Anubhav Jain
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
Anubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
Anubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Anubhav Jain
 

More from Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 

Recently uploaded

AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
Faculty of Applied Chemistry and Materials Science
 
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Dr NEETHU ASOKAN
 
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
marigreenproject
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
Sérgio Sacani
 
Pancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptxPancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptx
muralinath2
 
Classification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana MadjarClassification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana Madjar
Faculty of Applied Chemistry and Materials Science
 
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
Sérgio Sacani
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
Thane Heins
 
Data Visualization Workshop for Summer Interns
Data Visualization Workshop for Summer InternsData Visualization Workshop for Summer Interns
Data Visualization Workshop for Summer Interns
Zachary Labe
 
Post RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) MetabolismPost RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) Metabolism
Areesha Ahmad
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
NuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdfNuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdf
pablovgd
 
Composting blue materials - Joshua Cabell
Composting blue materials - Joshua CabellComposting blue materials - Joshua Cabell
Composting blue materials - Joshua Cabell
Faculty of Applied Chemistry and Materials Science
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Sérgio Sacani
 
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
bellared2
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
Burn child health Nursing 3rd year presentation..pptx
Burn child health Nursing 3rd year presentation..pptxBurn child health Nursing 3rd year presentation..pptx
Burn child health Nursing 3rd year presentation..pptx
sohil4260
 
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Sérgio Sacani
 
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
Sérgio Sacani
 
Complementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotailComplementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotail
Sérgio Sacani
 

Recently uploaded (20)

AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
 
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
Bioconversion of sago waste and oil cakes into biobutanol using Environmental...
 
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
17. 20240529_Ingrid Olesen_MariGreen summer school.pdf
 
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
SOFIA/HAWC+ FAR-INFRARED POLARIMETRIC LARGE-AREA CMZ EXPLORATION (FIREPLACE) ...
 
Pancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptxPancreas_functional anatomy_enzymes.pptx
Pancreas_functional anatomy_enzymes.pptx
 
Classification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana MadjarClassification and role of plant nutrients - Roxana Madjar
Classification and role of plant nutrients - Roxana Madjar
 
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
All-domain Anomaly Resolution Office Supplement to Oak Ridge National Laborat...
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
 
Data Visualization Workshop for Summer Interns
Data Visualization Workshop for Summer InternsData Visualization Workshop for Summer Interns
Data Visualization Workshop for Summer Interns
 
Post RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) MetabolismPost RN - Biochemistry (Unit 7) Metabolism
Post RN - Biochemistry (Unit 7) Metabolism
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
NuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdfNuGOweek 2024 Ghent programme__flyer.pdf
NuGOweek 2024 Ghent programme__flyer.pdf
 
Composting blue materials - Joshua Cabell
Composting blue materials - Joshua CabellComposting blue materials - Joshua Cabell
Composting blue materials - Joshua Cabell
 
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
Detection of the elusive dangling OH ice features at ~2.7 μm in Chamaeleon I ...
 
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
Celebrity Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl S...
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
Burn child health Nursing 3rd year presentation..pptx
Burn child health Nursing 3rd year presentation..pptxBurn child health Nursing 3rd year presentation..pptx
Burn child health Nursing 3rd year presentation..pptx
 
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
 
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
A Strong He II λ1640 Emitter with an Extremely Blue UV Spectral Slope at z=8....
 
Complementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotailComplementary interstellar detections from the heliotail
Complementary interstellar detections from the heliotail
 

Automated Machine Learning Applied to Diverse Materials Design Problems

  • 1. Automated Machine Learning Applied to Diverse Materials Design Problems Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA MRS Spring Meeting, 2019 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. 2 There are many algorithms developed for machine learning in materials – new ones are constantly reported!
  • 3. 3 Q: Which one is the “best” based on all the literature reports?
  • 4. 4 Q: Which one is the “best” based on all the literature reports? A: Can’t tell! They are (almost?) all tested on different data sets.
  • 5. • Different data sets – Source (e.g., OQMD vs MP) – Quantity (e.g., MP 2018 vs MP 2019) – Subset / data filtering (e.g., ehull<X) • Different cross validation metrics – e.g., what fraction is test set? • Often, this can’t be helped – Usually can’t access training / test data of past works – Sometimes no runnable version of a published algorithm – should referees be tougher on this? 5 Difficulty of comparing different ML algorithms Data set used in study A Data set used in study B Data set used in study C
  • 6. • Matbench: a standard test method for materials science problems – A set of diverse materials data sets for testing – A consistent cross-validation strategy • Automatminer: A “black box” materials science ML algorithm – Materials-specific descriptors using matminer – AutoML to tune hyperparameters 6 Outline
  • 7. • Matbench: a standard test method for materials science problems – A set of diverse materials data sets for testing – A consistent cross-validation strategy • Automatminer: A “black box” materials science ML algorithm – Materials-specific descriptors using matminer – AutoML to tune hyperparameters 7 Outline
  • 8. • We want a test set that contains a diverse array of problems – Smaller data versus larger data – Different applications (electronic, mechanical, etc.) – Composition-only or structure information available – Classification or regression • We also want a cross-validation metric that gives reliable error estimates – i.e., less dependent on specific choice of splits 8 A standard test method for ML algorithms in materials
  • 9. 9 Overview of Matbench test set Target Property Data Source Samples Method Bulk Modulus Materials Project 10,987 DFT-GGA Shear Modulus Materials Project 10,987 DFT-GGA Band Gap Materials Project 106,113 DFT-GGA Metallicity Materials Project 106,113 DFT-GGA Band Gap Zhuo et al. [1] 6,354 Experiment Metallicity Zhuo et al. [1] 6,354 Experiment Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment Refractive index Materials Project 4,764 DFPT-GGA Formation Energy Materials Project 132,752 DFT-GGA Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF Steel yield strength Citrine Informatics 312 Experiment 1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
  • 10. <1K 1K-10K10K-100K >100K 10 Diversity of benchmark suite mechanical electronic stability optical thermal classification regression experiment (composition only) DFT (structure) application data size problem type data type
  • 11. • Matbench: a standard test method for materials science problems – A set of diverse materials data sets for testing – A consistent cross-validation strategy • Automatminer: A “black box” materials science ML algorithm – Materials-specific descriptors using matminer – AutoML to tune hyperparameters 11 Outline
  • 12. 12 Most commonly used test procedure • Training/validation is used for model selection • Test / hold-out is used only for error estimation (Test set should not inform model selection, i.e. “final answer”)
  • 13. Think of it as N different “universes” – we have a different training of the model in each universe and a different hold-out. 13 Nested CV – like hold-out, but varies the hold-out set
  • 14. Think of it as N different “universes” – we have a different training of the model in each universe and a different hold-out. 14 Nested CV – like hold-out, but varies the hold-out set “A nested CV procedure provides an almost unbiased estimate of the true error.” Varma and Simon, Bias in error estimation when using cross-validation for model selection (2006)
  • 15. • Matbench is a curated set of data sets that provide a diverse set of problems representative of those found in materials science • ML developers can work on a consistent set of test problems • Ideally – consistent reports of error in the literature! • Matbench v1 will be released soon … – Let us know if you have feedback / comments / suggestions! 15 Summary of Matbench
  • 16. • Matbench: a standard test method for materials science problems – A set of diverse materials data sets for testing – A consistent cross-validation strategy • Automatminer: A “black box” materials science ML algorithm – Materials-specific descriptors using matminer – AutoML to tune hyperparameters 16 Outline
  • 17. 17 Typically several steps of machine learning are performed by a human researcher – can these be automated? Descriptors developed and chosen by a researcher ML model developed and chosen by a researcher Why can’t we just give the computer some raw input data (compositions, crystal structures) and output properties and get back an ML model?
  • 18. 18 Automatminer is a ”black box” machine learning model Give it any data set with either composition or structure inputs, and automatminer will train an ML model (no researcher intervention)
  • 19. 19 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 20. 20 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 21. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 21 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 22. 22 The matminer library is available for open use Ward et al. Matminer : An open source toolkit for materials data mining. Computational Materials Science, 152, 60–69 (2018). Paper Docs Support hackingmaterials.github.io /matminer https://groups.google.com/ forum/#!forum/matminer
  • 23. 23 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 24. • TPOT uses genetic algorithms to determine the best ML model and hyperparameters using the training / validation set – Also some internal feature reduction, scaling, etc. – a full pipeline of operations • Menu of ML options is all the algorithms implemented in scikit-learn – i.e., not neural networks • Parameters include population size and number of generations for genetic algorithm – Tradeoff between CPU time and performance – Auto-convergence or early stop possible 24 TPOT for AutoML Olson, R. S. & Moore, J. H. TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. in Proceedings of the Workshop on Automatic Machine Learning (eds. Hutter, F., Kotthoff, L. & Vanschoren, J.) 64, 66–74 (PMLR, 2016).
  • 25. • Comparison 1: CGCNN • Comparison 2: MEGNET • Comparison 3: Untuned random forest (“no frills”) – MAGPIE features for composition – MAGPIE + Sine Coulomb matrix for structure 25 Comparing automatminer against state-of-the-art Xie, T. & Grossman, J. C. Phys. Rev. Lett. 120, 145301 (2018). Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. arXiv:1812.05055 (2018). Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. npj Computational Materials 2, 16028–16028 (2016). Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. npj Computational Materials 2, 16028–16028 (2016).
  • 26. 26 Matbench results for all algorithms
  • 27. 27 How does data set size affect performance? For all structure-based regression problems, divide the mean absolute error of model by mean absolute deviation of the data set. • Always predicting the mean would yield a value of 1.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 100 1000 10000 100000 1000000 MAE/MAD Data Set Size automatminer CGCNN MEGNET
  • 28. • Automatminer is much faster / easier to train – One can adjust training time of all algorithms to some extent – Note that MEGNET is faster than CGCNN but same order of magnitude • GPUs might greatly accelerate CGCNN / MEGNET training (no timing available) 28 Algorithm training time per fold on 8-16 CPU cores Data set size Automatminer CGNN MEGNET ~1K ~1 hour or less ~few hours ~few hours ~10K ~few hours ~few days ~few days ~100K ~12 hours ~few weeks ~few weeks
  • 29. 29 Getting started with automatminer Paper Docs Support hackingmaterials.github.io /automatminer https://groups.google.com/ forum/#!forum/matminer In preparation …
  • 30. • We proposed a diverse benchmark test suite of problems to develop and test ML algorithms against • We presented a black-box ML algorithm, Automatminer, that performs comparably or outperforms literature values on small data sets (N<10,000), but does more poorly on larger data sets • Further upgrades to automatminer are in progress! – See if we can do better on N>10,000 problems – Although crystal networks might alternately use transfer learning to tackle N<10,000 problems (e.g., MEGNET) 30 Conclusions
  • 31. 31 Acknowledgements Alex Dunn Graduate student Qi Wang Postdoc Alex Ganose Postdoc Alireza Faghaninia Postdoc Samy Cherfaoui Undergraduate Daniel Dopp Undergraduate Funding: U.S. Department of Energy, Basic Energy Sciences Slides (already) posted to hackingmaterials.lbl.gov