SlideShare a Scribd company logo
Open Source Tools for Materials Informatics
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
MRS Fall Meeting 2019
Slides (already) posted to hackingmaterials.lbl.gov
Staffing interdisciplinary research
Machine learningMaterials Science
I find a recurring dilemma and asymmetry in
staffing materials informatics research
Materials Informatics
3
Who has a tougher job to get started?
MS&E major CS major
• Already has background in the
material science aspects of the
project
• But needs to learn the
machine learning and
software engineering aspects
• Already has background in
software engineering and
appropriate machine learning
• But needs to learn the
materials science aspects
4
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
5
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
easier to pick up / self-learn
random forests & neural networks
than
phase diagrams & crystal structures
6
There is an asymmetry in resources available
MS&E major CS major
• Hands-on code and examples to
run and modify
• Hundreds of Youtube videos
and online courses
• Code reviews from collaborators
• And the standard books, etc.
• Books and research articles
• Conversations with colleagues,
impromptu lectures
• Practice problems? Worked
examples? Interactive code?
Outline
7
①Matminer: data and descriptors for
producing ML structure-property
relationships
② Matscholar – applying natural language
processing to materials science information
retrieval
8
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
How do we get
labeled training
/test data?
How do we know
if our ML model is
extraordinary?
9
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
10
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
11
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we get
labeled training
/test data?
• Typically, a lot of attention is given to advanced
algorithms for machine learning
– e.g., deep neural networks versus standard ML
• But perhaps there is not enough emphasis on
developing the appropriate data sets
– with enough information to train ML algorithms
– with sufficient data quality
– easy enough for anyone to at least get started without
specialized knowledge
12
What about data?
The importance of data
13
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-
research-and-possibly-the-world/
14
What is ImageNet?
The ImageNet data
set collected and
hand-labeled (e.g.,
via Amazon
Mechanical Turk).
The latest version
has over 14 million
hand-annotated
images, organized
into ~20,000
categories
How data stimulates new algorithms
15
How data stimulates new algorithms
16
How can we create an
ImageNet for materials
science?
• We want a test set that contains a diverse array
of problems
– Smaller data versus larger data
– Different applications (electronic, mechanical, etc.)
– Composition-only or structure information available
– Classification or regression
• We also want a cross-validation metric that gives
reliable error estimates
– i.e., less dependent on specific choice of splits
17
An “ImageNet” for materials science
18
Overview of Matbench test set
Target Property Data Source Samples Method
Bulk Modulus Materials Project 10,987 DFT-GGA
Shear Modulus Materials Project 10,987 DFT-GGA
Band Gap Materials Project 106,113 DFT-GGA
Metallicity Materials Project 106,113 DFT-GGA
Band Gap Zhuo et al. [1] 6,354 Experiment
Metallicity Zhuo et al. [1] 6,354 Experiment
Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment
Refractive index Materials Project 4,764 DFPT-GGA
Formation Energy Materials Project 132,752 DFT-GGA
Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA
Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA
Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF
Steel yield strength Citrine Informatics 312 Experiment
1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
<1K
1K-10K10K-100K
>100K
19
Diversity of benchmark suite
mechanical
electronic
stability
optical
thermal
classification
regression
experiment
(composition
only)
DFT
(structure)
application data size
problem
type
data type
20
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we know
if our ML model is
extraordinary?
21
How about a benchmark algorithm?
Automatminer is a ”black box” machine learning model
Give it any data set with either composition or structure inputs, and
automatminer will train an ML model (no researcher intervention)
22
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
23
Can actually do apple—to-apples competition between
algorithms
24
If we can get a well-established “benchmark”, perhaps
interdisciplinary teams can start hammering on accuracy
Today
5years
10years
A lower barrier to entry
in the field means more
ideas can be tested from
more researchers
Matbenchtestset
averageerror
25
Matminer, matbench, and automatminer can all be
accessed, used, and modified by anyone
Code / Examples all on Github
• github.com/hackingmaterials/matminer
• github.com/hackingmaterials/matminer_examples
• github.com/hackingmaterials/automatminer
Matbench data on Figshare
• (coming soon, still finalizing)
Free support via Discourse
• https://discuss.matsci.org
Outline
26
① Matminer: data and descriptors for producing
ML structure-property relationships
②Matscholar – applying natural language
processing to materials science information
retrieval
We have extracted ~2
million abstracts of
relevant scientific
articles
We use natural
language processing
algorithms to try to
extract knowledge from
all this data
27
Goal: collect and organize knowledge embedded in the
materials science literature
28
We’ve developed algorithms to automatically tag keywords
in the abstracts
29
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
30
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
• How do we get more people
benefitting from this work
and involved in improving it?
• One solution - expose an
easy-to-use web frontend,
with links to all the backend
codes in case people want to
dive further
– New tools like Plotly Dash
make this easier than ever
31
Using a web site as a “gateway” into the algorithms
frontend
backend
32
https://www.matscholar.com – demo 1
33
https://www.matscholar.com – demo 2
34
Matscholar MRS!
https://matscholar-mrs.herokuapp.com
35
Hopefully these frontend demos get you interested enough
to check the “About page”
• We need more resources to help computer
scientists learn about materials science topics
through hands-on examples and interactive demos
• Some things that can help:
– Open-source implementations of materials science
methods
– Interactive examples (e.g., Jupyter)
– Documentation and support(!)
– Labeled data sets
– Front-ends for easy exploration
36
Concluding thoughts
37
Funding acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• Matminer
– U.S. Department of Energy, Materials Science Division
• Matscholar
– Toyota Research Institutes

More Related Content

What's hot

Graphene: its increasing economic feasibility
Graphene: its increasing economic feasibility Graphene: its increasing economic feasibility
Graphene: its increasing economic feasibility
Jeffrey Funk
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
Anubhav Jain
 
A*STAR Webinar on The AI Revolution in Materials Science
A*STAR Webinar on The AI Revolution in Materials ScienceA*STAR Webinar on The AI Revolution in Materials Science
A*STAR Webinar on The AI Revolution in Materials Science
University of California, San Diego
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
Anubhav Jain
 
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...
Akinola Oyedele
 
Nanotechnology
NanotechnologyNanotechnology
Nanotechnology
Techef In
 
Nano-electronics
Nano-electronicsNano-electronics
Nano-electronics
Abhishek Syal
 
Solar Power: THE PEROVSKITE
Solar Power: THE PEROVSKITESolar Power: THE PEROVSKITE
Solar Power: THE PEROVSKITECyra Mae Soreda
 
Research proposal on organic-inorganic halide perovskite light harvesting mat...
Research proposal on organic-inorganic halide perovskite light harvesting mat...Research proposal on organic-inorganic halide perovskite light harvesting mat...
Research proposal on organic-inorganic halide perovskite light harvesting mat...
Rajan K. Singh
 
Graphene and its future applications
Graphene and its future applicationsGraphene and its future applications
Graphene and its future applications
Arpit Agarwal
 
Machine Learning for Chemical Sciences
Machine Learning for Chemical SciencesMachine Learning for Chemical Sciences
Machine Learning for Chemical Sciences
Ichigaku Takigawa
 
nanotechnology presentation in college (b.tech)
 nanotechnology presentation in college (b.tech) nanotechnology presentation in college (b.tech)
nanotechnology presentation in college (b.tech)
Prashant Singh
 
Graphene ppt
Graphene pptGraphene ppt
Graphene ppt
vishal anand
 
Graphene Based Material for Biomedical Applications
Graphene Based Material for Biomedical ApplicationsGraphene Based Material for Biomedical Applications
Graphene Based Material for Biomedical Applications
Dr. Sitansu Sekhar Nanda
 
Artificial Intelligence: Context of application of AI in Chemicals
Artificial Intelligence: Context of application of AI in ChemicalsArtificial Intelligence: Context of application of AI in Chemicals
Artificial Intelligence: Context of application of AI in Chemicals
accenture
 
Nanotechnology.Opportunities&Challenges
Nanotechnology.Opportunities&ChallengesNanotechnology.Opportunities&Challenges
Nanotechnology.Opportunities&Challengeslusik
 
Organic- Inorganic Perovskite Solar Cell
Organic- Inorganic Perovskite Solar CellOrganic- Inorganic Perovskite Solar Cell
Organic- Inorganic Perovskite Solar Cell
Rajan K. Singh
 

What's hot (20)

Graphene: its increasing economic feasibility
Graphene: its increasing economic feasibility Graphene: its increasing economic feasibility
Graphene: its increasing economic feasibility
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
A*STAR Webinar on The AI Revolution in Materials Science
A*STAR Webinar on The AI Revolution in Materials ScienceA*STAR Webinar on The AI Revolution in Materials Science
A*STAR Webinar on The AI Revolution in Materials Science
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Nanoelectronics
NanoelectronicsNanoelectronics
Nanoelectronics
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...
 
Nanotechnology
NanotechnologyNanotechnology
Nanotechnology
 
Nano-electronics
Nano-electronicsNano-electronics
Nano-electronics
 
Solar Power: THE PEROVSKITE
Solar Power: THE PEROVSKITESolar Power: THE PEROVSKITE
Solar Power: THE PEROVSKITE
 
Research proposal on organic-inorganic halide perovskite light harvesting mat...
Research proposal on organic-inorganic halide perovskite light harvesting mat...Research proposal on organic-inorganic halide perovskite light harvesting mat...
Research proposal on organic-inorganic halide perovskite light harvesting mat...
 
Graphene and its future applications
Graphene and its future applicationsGraphene and its future applications
Graphene and its future applications
 
Machine Learning for Chemical Sciences
Machine Learning for Chemical SciencesMachine Learning for Chemical Sciences
Machine Learning for Chemical Sciences
 
nanotechnology presentation in college (b.tech)
 nanotechnology presentation in college (b.tech) nanotechnology presentation in college (b.tech)
nanotechnology presentation in college (b.tech)
 
Graphene ppt
Graphene pptGraphene ppt
Graphene ppt
 
Graphene Based Material for Biomedical Applications
Graphene Based Material for Biomedical ApplicationsGraphene Based Material for Biomedical Applications
Graphene Based Material for Biomedical Applications
 
Artificial Intelligence: Context of application of AI in Chemicals
Artificial Intelligence: Context of application of AI in ChemicalsArtificial Intelligence: Context of application of AI in Chemicals
Artificial Intelligence: Context of application of AI in Chemicals
 
Nanotechnology.Opportunities&Challenges
Nanotechnology.Opportunities&ChallengesNanotechnology.Opportunities&Challenges
Nanotechnology.Opportunities&Challenges
 
Organic- Inorganic Perovskite Solar Cell
Organic- Inorganic Perovskite Solar CellOrganic- Inorganic Perovskite Solar Cell
Organic- Inorganic Perovskite Solar Cell
 

Similar to Open Source Tools for Materials Informatics

Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
Anubhav Jain
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
Anubhav Jain
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
Jason Hattrick-Simpers
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
Anubhav Jain
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
KAMAL CHOUDHARY
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
IEEEMEMTECHSTUDENTSPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
IEEEFINALYEARSTUDENTPROJECT
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
Anubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Anubhav Jain
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
aimsnist
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Anubhav Jain
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
Dasha Herrmannova
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
Anubhav Jain
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
Jason Hattrick-Simpers
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
aimsnist
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 

Similar to Open Source Tools for Materials Informatics (20)

Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 

More from Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
Anubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
Anubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
Anubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
Anubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
Anubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
Anubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
Anubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
Anubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
Anubhav Jain
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
Anubhav Jain
 

More from Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 

Recently uploaded

BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 

Recently uploaded (20)

BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 

Open Source Tools for Materials Informatics

  • 1. Open Source Tools for Materials Informatics Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA MRS Fall Meeting 2019 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. Staffing interdisciplinary research Machine learningMaterials Science I find a recurring dilemma and asymmetry in staffing materials informatics research Materials Informatics
  • 3. 3 Who has a tougher job to get started? MS&E major CS major • Already has background in the material science aspects of the project • But needs to learn the machine learning and software engineering aspects • Already has background in software engineering and appropriate machine learning • But needs to learn the materials science aspects
  • 4. 4 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started?
  • 5. 5 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started? easier to pick up / self-learn random forests & neural networks than phase diagrams & crystal structures
  • 6. 6 There is an asymmetry in resources available MS&E major CS major • Hands-on code and examples to run and modify • Hundreds of Youtube videos and online courses • Code reviews from collaborators • And the standard books, etc. • Books and research articles • Conversations with colleagues, impromptu lectures • Practice problems? Worked examples? Interactive code?
  • 7. Outline 7 ①Matminer: data and descriptors for producing ML structure-property relationships ② Matscholar – applying natural language processing to materials science information retrieval
  • 8. 8 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors? How do we get labeled training /test data? How do we know if our ML model is extraordinary?
  • 9. 9 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors?
  • 10. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 10 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 11. 11 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we get labeled training /test data?
  • 12. • Typically, a lot of attention is given to advanced algorithms for machine learning – e.g., deep neural networks versus standard ML • But perhaps there is not enough emphasis on developing the appropriate data sets – with enough information to train ML algorithms – with sufficient data quality – easy enough for anyone to at least get started without specialized knowledge 12 What about data?
  • 13. The importance of data 13 https://qz.com/1034972/the-data-that-changed-the-direction-of-ai- research-and-possibly-the-world/
  • 14. 14 What is ImageNet? The ImageNet data set collected and hand-labeled (e.g., via Amazon Mechanical Turk). The latest version has over 14 million hand-annotated images, organized into ~20,000 categories
  • 15. How data stimulates new algorithms 15
  • 16. How data stimulates new algorithms 16 How can we create an ImageNet for materials science?
  • 17. • We want a test set that contains a diverse array of problems – Smaller data versus larger data – Different applications (electronic, mechanical, etc.) – Composition-only or structure information available – Classification or regression • We also want a cross-validation metric that gives reliable error estimates – i.e., less dependent on specific choice of splits 17 An “ImageNet” for materials science
  • 18. 18 Overview of Matbench test set Target Property Data Source Samples Method Bulk Modulus Materials Project 10,987 DFT-GGA Shear Modulus Materials Project 10,987 DFT-GGA Band Gap Materials Project 106,113 DFT-GGA Metallicity Materials Project 106,113 DFT-GGA Band Gap Zhuo et al. [1] 6,354 Experiment Metallicity Zhuo et al. [1] 6,354 Experiment Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment Refractive index Materials Project 4,764 DFPT-GGA Formation Energy Materials Project 132,752 DFT-GGA Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF Steel yield strength Citrine Informatics 312 Experiment 1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
  • 19. <1K 1K-10K10K-100K >100K 19 Diversity of benchmark suite mechanical electronic stability optical thermal classification regression experiment (composition only) DFT (structure) application data size problem type data type
  • 20. 20 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we know if our ML model is extraordinary?
  • 21. 21 How about a benchmark algorithm? Automatminer is a ”black box” machine learning model Give it any data set with either composition or structure inputs, and automatminer will train an ML model (no researcher intervention)
  • 22. 22 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 23. 23 Can actually do apple—to-apples competition between algorithms
  • 24. 24 If we can get a well-established “benchmark”, perhaps interdisciplinary teams can start hammering on accuracy Today 5years 10years A lower barrier to entry in the field means more ideas can be tested from more researchers Matbenchtestset averageerror
  • 25. 25 Matminer, matbench, and automatminer can all be accessed, used, and modified by anyone Code / Examples all on Github • github.com/hackingmaterials/matminer • github.com/hackingmaterials/matminer_examples • github.com/hackingmaterials/automatminer Matbench data on Figshare • (coming soon, still finalizing) Free support via Discourse • https://discuss.matsci.org
  • 26. Outline 26 ① Matminer: data and descriptors for producing ML structure-property relationships ②Matscholar – applying natural language processing to materials science information retrieval
  • 27. We have extracted ~2 million abstracts of relevant scientific articles We use natural language processing algorithms to try to extract knowledge from all this data 27 Goal: collect and organize knowledge embedded in the materials science literature
  • 28. 28 We’ve developed algorithms to automatically tag keywords in the abstracts
  • 29. 29 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 30. 30 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 31. • How do we get more people benefitting from this work and involved in improving it? • One solution - expose an easy-to-use web frontend, with links to all the backend codes in case people want to dive further – New tools like Plotly Dash make this easier than ever 31 Using a web site as a “gateway” into the algorithms frontend backend
  • 35. 35 Hopefully these frontend demos get you interested enough to check the “About page”
  • 36. • We need more resources to help computer scientists learn about materials science topics through hands-on examples and interactive demos • Some things that can help: – Open-source implementations of materials science methods – Interactive examples (e.g., Jupyter) – Documentation and support(!) – Labeled data sets – Front-ends for easy exploration 36 Concluding thoughts
  • 37. 37 Funding acknowledgements Slides (already) posted to hackingmaterials.lbl.gov • Matminer – U.S. Department of Energy, Materials Science Division • Matscholar – Toyota Research Institutes