SlideShare a Scribd company logo
1 of 56
Download to read offline
Introduction (Part I):
High-throughput computation and machine learning
applied to materials design
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
LLNL Computational Chemistry Materials Science
Summer Institute, 2018
Slides (already) posted to hackingmaterials.lbl.gov
New materials discovery for devices is difficult
•  Novel materials with enhanced performance characteristics
could make a big dent in sustainability, scalability, and cost
•  In practice, we tend to re-use the same fundamental materials
for decades
–  solar power w/Si since 1950s
–  graphite/LiCoO2 (basis of today’s Li battery electrodes) since 1990
–  Bi2Te3 and PbTe thermoelectrics first studied ~1910
•  Although there are lots of improvements to manufacturing,
microstructure, etc., there not many new basic compositions
•  Why is discovering better materials such a challenge?
2
3
A material is defined at multiple length scales –
stick to the fundamental scale for now
4
A material is defined at multiple length scales –
stick to the fundamental scale for now
5
Atoms in a box – the materials universe is huge!
•  Bag of 30 atoms
•  Each atom is one of 50
elements
•  Arrange on 10x10x10 lattice
•  Over 10108 possibilities!
–  more than grains of sand on all
beaches (1021)
–  more than number of atoms in
universe (1080)
6
Finding the right material is like
“finding a needle in a haystack”
What constrains traditional experimentation?
7
“[The Chevrel] discovery resulted from a lot of
unsuccessful experiments of Mg ions insertion
into well-known hosts for Li+ ions insertion, as
well as from the thorough literature analysis
concerning the possibility of divalent ions
intercalation into inorganic materials.”
-Aurbach group, on discovery of Chevrel cathode
for multivalent (e.g., Mg2+) batteries
Levi, Levi, Chasid, Aurbach
J. Electroceramics (2009)
Can we invent other, faster ways of finding materials?
•  The Materials Genome
Initiative wants to discover,
develop, manufacture, and
deploy advanced materials
twice as fast at a fraction of
the cost
•  Strategy involves:
–  simulations & supercomputers
–  digital data and data mining
–  better merging computation
and experiment
8
https://obamawhitehouse.archives.gov/mgi
Outline
9
①  From quantum mechanics to density functional
theory (DFT)
②  “High-throughput” DFT
③  Calculation and property databases
④  Data mining approaches to materials design
⑤  Preview of part II (tomorrow)
The basis of density functional theory
is quantum mechanics
10
−!2
2m
∇2
Ψ(r)+V (r)Ψ(r) = EΨ(r)
Schrödinger equation describes all the properties
of a system through the wavefunction:
Time-independent, non-relativistic Schrödinger equation
•  There aren’t too many real situations where we can
get a closed solution to the Schrödinger equation
•  Let’s pretend we want to approach things
numerically for 1000 electrons
–  There are ~500,000 electron-electron interactions to worry
about.
–  Even storing the wavefunction would take ~101000 GB!
•  Discretize the x,y,z, position of each electron into a 1000-
element grid = 1 billion positions per electron
•  Need the wavefunction output (real + complex part) for each
combination of all electron positions, i.e. 1E9 ^ (1000) * 2, or
2E9000 values
•  even at 1 byte per wavefunction value (low resolution), you have
about 2E1000 GB needed needed to store the wavefunction!
11
The wave function is formidable
Maybe Dirac said it best …
12
“The underlying physical laws necessary
for the mathematical theory of a large part
of physics and the whole of chemistry are
thus completely known, and the difficulty
is only that the exact application of these
laws leads to equations much too
complicated to be soluble.”
“It therefore becomes desirable that
approximate practical methods of applying
quantum mechanics should be developed,
which can lead to an explanation of the
main features of complex atomic systems
without too much computation.”
What is density functional theory (DFT)?
13
DFT is a method solve for the electronic structure and energetics of arbitrary
materials starting from first-principles. It replaces many-body interactions
with a mean field interaction that reproduces the same charge density.
In theory, it is exact for the ground state. In practice, accuracy depends on the
choice of (some) parameters, the type of material, the property to be studied,
and whether the simulated system (crystal) is a good approximation of reality.
DFT resulted in the 1999 Nobel Prize for chemistry (W. Kohn). It is
responsible for 2 of the top 10 cited papers of all time, across all sciences.
e–	e–	
e–	 e–	
e–	 e–
How does one use DFT to design new materials?
14
A. Jain, Y. Shin, and K. A.
Persson, Nat. Rev. Mater.
1, 15004 (2016).
How accurate is DFT in practice?
15
Shown are typical DFT results for (i) Li
battery voltages, (ii) electronic band gaps,
and (iii) bulk modulus
(i) (ii)
(iii)
(i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder,
Phys. Rev. B 82, 075122 (2010).
(ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010).
(iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst,
M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S.
Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009
(2015).
battery voltages
band gaps
bulk modulus
•  System size is essentially limited to a few thousand atoms
–  many important materials phenomena simply do not occur at this
length scale
•  Certain materials, such as those with strong electron
correlation, remain difficult to model accurately
•  Certain properties, including excited state properties
such as band gap, remain difficult to model accurately
•  These are all active areas of research and improvement
to the theory, and the situation is improving on all fronts
16
Limitations of density functional theory
Some limitations of DFT are addressed by other techniques
17
Source: NASA
Viewpoint of the DFT accuracy situation
•  Improvements to the
theory would certainly be
very helpful
–  Many researchers are
working on this problem
–  New and better methods
do appear over time, e.g.,
hybrid functionals for
solids.
•  But – let’s not wait for
perfection before we start
applying it.
18
The map is not perfect, but time
to set sail and leave port!
Outline
19
①  From quantum mechanics to density functional
theory (DFT)
②  “High-throughput” DFT
③  Calculation and property databases
④  Data mining approaches to materials design
⑤  Preview of part II (tomorrow)
20
From a needle in a haystack to …
21
… hiring an army to search through the haystack
High-throughput DFT: a key idea
22
Automate the DFT
procedure
Supercomputing
Power
FireWorks
Software for programming
general computational
workflows that can be
scaled across large
supercomputers.
NERSC
Supercomputing center,
processor count is
~100,000 desktop
machines. Other centers
are also viable.
High-throughput
materials screening
G. Ceder & K.A.
Persson, Scientific
American (2015)
•  The answer is “it really varies a lot”
–  how big / complicated are the materials you are modeling?
–  how complex / expensive are the properties you are
modeling?
•  Ballpark numbers:
–  Low range: optimize structure of ~3-atom compounds
•  time to do a million materials ~ 10 million CPU-hours
–  Medium range: bulk modulus of ~50 atom compounds
•  time to do a million materials ~ 2 billion CPU hours
•  The largest CPU allocations from the DOE are
typically in the order of ~100 million CPU-hours
23
How much computer time is needed for
high-throughput DFT?
Examples of (early) high-throughput studies
24
Application Researcher Search space Candidates Hit rate
Scintillators Klintenberg et al. 22,000 136 1/160
Curtarolo et al. 11,893 ? ?
Topological insulators Klintenberg et al. 60,000 17 1/3500
Curtarolo et al. 15,000 28 1/535
High TC superconductors Klintenberg et al. 60,000 139 1/430
Thermoelectrics – ICSD
- Half Heusler systems
- Half Heusler best ZT
Curtarolo et al. 2,500
80,000
80,000
20
75
18
1/125
1/1055
1/4400
1-photon water splitting Jacobsen et al. 19,000 20 1/950
2-photon water splitting Jacobsen et al. 19,000 12 1/1585
Transparent shields Jacobsen et al. 19,000 8 1/2375
Hg adsorbers Bligaard et al. 5,581 14 1/400
HER catalysts Greeley et al. 756 1 1/756*
Li ion battery cathodes Ceder et al. 20,000 4 1/5000*
Entries marked with * have experimentally verified the candidates.
See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
Computations predict, experiments confirm
25
Sidorenkite-based Li-ion battery
cathodes
LED phosphors
YCuTe2 thermoelectrics
Wang, Z., Ha, J., Kim, Y. H., Im, W. Bin, McKittrick, J. &
Ong, S. P. Mining Unexplored Chemistries for Phosphors
for High-Color-Quality White-Light-Emitting Diodes.
Joule 2, 914–926 (2018).
Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang,
Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite
(Na3MnPO4CO3): A New Intercalation Cathode Material
for Na-Ion Batteries, Chem. Mater., 2013
Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs,
ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M;
Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric
Properties of Intrinsically Doped YCuTe2 with CuTe4-based
Layered Structure. J. Mat. Chem C, 2016
More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
•  All the limitations of standard DFT still apply
•  How to set DFT parameters automatically?
–  A single universal parameter set will not accurately model all
materials and all properties
–  Different parameter sets for different materials requires deciding
how to divide things up, and adds complication of “incompatibility”
between calculations
•  How to handle non-uniformity of DFT errors when doing meta
analyses?
•  How to run high-throughput efficiently on large computers?
–  The biggest supercomputers are designed for massive parallelization;
unfortunately, DFT does not scale well to many processors
26
Limitations of high-throughput DFT
Outline
27
①  From quantum mechanics to density functional
theory (DFT)
②  “High-throughput” DFT
③  Calculation and property databases
④  Data mining approaches to materials design
⑤  Preview of part II (tomorrow)
With HT-DFT, we can generate data rapidly – what to do next?
28
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
Materials Project database: putting all the data online
•  Online resource of density
functional theory simulation data
for ~85,000 inorganic materials
•  Includes band structures, elastic
tensors, piezoelectric tensors,
battery properties and more
•  ~55,000 registered users
•  Free
•  www.materialsproject.org
29
Jain et al. Commentary: The Materials Project: A
materials genome approach to accelerating
materials innovation. APL Mater. 1, 11002 (2013).!
The data is re-used by the community
30
K. He, Y. Zhou, P. Gao, L. Wang, N. Pereira, G.G. Amatucci, et al.,
Sodiation via Heterogeneous Disproportionation in FeF2 Electrodes for
Sodium-Ion Batteries., ACS Nano. 8 (2014) 7251–9.
M.M. Doeff, J. Cabana,
M. Shirpour, Titanate
Anodes for Sodium Ion
Batteries, J. Inorg.
Organomet. Polym. Mater.
24 (2013) 5–14.
Further examples in: A. Jain, K.A. Persson, G. Ceder. APL Materials (2016).
31
There are now many
first-principles
computational
databases, including
ones not on this list
(e.g., NIST-Jarvis,
NREL-TEDesignLab)
Lin, L. Materials Databases
Infrastructure Constructed by
First Principles Calculations: A
Review. Mater. Perform. Charact.
4, MPC20150014 (2015).
•  All the limitations of standard DFT and high-
throughput DFT still apply
•  Communicating accuracy, limitations, etc. to a
diverse user group is difficult
•  It remains difficult to merge information from
different computational databases
–  Citrine Informatics is trying, e.g., www.citrination.com
32
Limitations of DFT databases
Outline
33
①  From quantum mechanics to density functional
theory (DFT)
②  “High-throughput” DFT
③  Calculation and property databases
④  Data mining approaches to materials design
⑤  Preview of part II (tomorrow)
34
From a needle in a haystack to …
35
… hiring an army to search through the haystack to …
36
Armies with metal detectors
37
Bottom-up vs top-down approach
Small number of
general principles
Large number of
specific cases
•  Conventional theory starts
with a small number of
principles and keeps
extending / simplifying to
tackle more and more cases
(growing the theory)
•  Data mining starts from a
*very* large space of
possible models and
removes ones that are
inconsistent with the data
(“trimming” the theories)
38
Early “data mining” (not really machine learning)
“Pettifor structure maps”
D.G. Pettifor: The structures of binary compound: I.
Phenomenological
structure maps. J. Phys. C: Solid State Phys. 19, 285–
313 (1986).
“Cation coordination numbers”
Brown, I. D. What factors determine cation coordination
numbers? Acta Crystallogr. Sect. B Struct. Sci. 44, 545–553
(1988).
39
An ML model automatically learns relationships between
input variables and output variables
The ML model can find nonlinear relationships
between descriptors that accurately reproduce /
model outputs
•  Some people see machine learning as just fancy
curve fitting
–  See “Big data need big theory too” by Coveney et al.
•  I see two things distinguishing ML:
–  ML is more flexible; some see it as “writing a program”
–  Curve fitting is about good interpolation, whereas a
large part of ML is figuring out how to use the data to
be predictive or perform a function (play Go)
40
Is ML just curve fitting?
41
Getting a “predictive” fit: 3-tier design
Image credit: Joseph Gonzalez, ds100.org
Image credit: Adi Bronshtein, Towards Data Science
•  Clustering groups
together data points
by their descriptors –
i.e., group “similar
points” together
•  No output value is
needed (unsupervised)
•  Many techniques,
including hierarchical
that shows groupings
as a function of cutoff
42
Unsupervised learning: clustering
W. Chen, J.-H. Pöhls, G. Hautier, D. Broberg, S.
Bajaj, U. Aydemir, Z. M. Gibbs, H. Zhu, M. Asta, G. J.
Snyder, B. Meredig, M. A. White, K. Persson, and A.
Jain, J. Mater. Chem. C 4, 4414 (2016).
clustering thermoelectric materials
by similarity
•  A more modern approach is
autoencoders
•  Neural networks are forced to
start with a high-dimensional data
set and reproduce that
information with a few number of
dimensions / neuron
•  The few dimensions that the
neural network finds tend to be
very good “descriptors”
•  Can then use these descriptors
on supervised problems
43
Unsupervised learning: autoencoders
Q. V. Le, Proc. 2013 IEEE Int. Conf. Acoust.
Speech Signal Process. 8595 (2013).
•  Regression techniques
can predict output
values for new data
•  ML commonly uses:
–  Lasso, Ridge, ElasticNet –
these are all regularization
techniques to prevent
overfitting
–  Kernel Ridge Regression,
which can find nonlinear
patterns in the data
44
Regression: beyond linear regression
K. Hansen, G. Montavon, F. Biegler, S. Fazli, M.
Rupp, M. Scheffler, O. A. Von Lilienfeld, A.
Tkatchenko, and K. R. Müller, J. Chem. Theory
Comput. 9, 3404 (2013).
•  Tree-based techniques make a
series of “decisions” based on
input features
•  These decisions are designed to
split the data into homogeneous
groups
•  At the end, the data should be
homogeneous enough to predict
a single value
•  Pro: highly interpretable
Cons: usually not very accurate,
gives discontinuous predictions
45
Regression: tree-based techniques (1)
J. Carrete, N. Mingo, S. Wang, and S. Curtarolo,
Adv. Funct. Mater. 24, 7427 (2014).
•  Random forests train multiple
trees each with slightly
different information
(different subset of input data
and features) and average or
vote on the results
•  Random forests tend to be a
good “starter” model to see
roughly how good ML will do
•  Gradient boosted trees go
one step further
46
Regression: tree-based techniques (2) –random forest
•  These systems try to
guess the next best point
to try for an application
given the existing data
•  The next choice might try
to maximize your output,
give highest chance of
some improvement, or
might instead be tuned to
give the maximum
increase in knowledge
about the problem
47
Recommendation engines and adaptive learning
D.	Xue,	P.	V.	Balachandran,	J.	Hogden,	J.	
Theiler,	D.	Xue,	and	T.	Lookman,	Nat.	
Commun.	7,	11241	(2016).
•  Neural networks are one of the
hottest topics in ML at the moment
•  Problem: *many* tunable
parameters, maybe a billion to train
•  If you try to train a deep neural
network with 1000 data points, you
might be fooling yourself
•  What to do?
–  identify problems with a lot of data
–  use “transfer learning”
–  wait for more data …
48
Neural networks
Image credit: kdnuggets.com
49
ML and “creativity” – inventing fake celebrities
None of these are real people! They are all fakes from a neural net (GAN).
Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive Growing of GANs for
Improved Quality, Stability, and Variation. 1–26 (2017). doi:10.1002/joe.20070
•  e.g., neural net generated thesis titles
–  Reconstruction of metal-to-motion : construction of
plasma emissions
–  Computational approaches for distributed microscopy
–  Optical effects on virtual radio Projects
–  Experiments, and protein games from multiple atoms
–  Hydrogel wireless charging via nanoparticle education
–  Supersone legal questions support kits regulation on
qubits and transportation
–  Atoms and characteristics of monolithic nanocity
50
Amateur fun with AI “creativity” at http://aiweirdness.com
•  No underlying physical constraints on the model
–  the machine doesn’t know whether it’s modeling baseball
statistics, physics of particle trajectories, or housing prices
•  Hard to know how much to trust an ML result
–  Uncertainty can be built into the model, but it’s not clear
that these are all that meaningful
–  Cross-validation estimates of accuracy is a “gold standard”,
but has many pitfalls, e.g. out of sample errors
•  Almost always a tradeoff between accuracy and
interpretability
51
Limitations and pitfalls of machine learning
52
ML behaves in ways that are often not well understood
We still do not really understand
how many of these models
really “work”
They can often give wrong
results, with a high degree of
confidence(!), that are very
different than our own intuition
Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are
easily fooled: High confidence predictions for unrecognizable
images. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
Recognit. 07–12–June, 427–436 (2015).
•  Small data sets – typically a few thousand (not millions) –
which is a major limitation for data-driven methods
•  Materials scientists are typically looking to predict
outliers, not typical examples
–  e.g., we are not looking for materials that behave essentially like
other materials the way ML looks for “images that look like
typical faces”
–  Instead, we are looking for “outlier” materials that behave
differently than known materials, i.e., like nothing in our dataset
•  A material is a complex object to describe to a computer
53
There are many ML challenges particular to materials science
Outline
54
①  From quantum mechanics to density functional
theory (DFT)
②  “High-throughput” DFT
③  Calculation and property databases
④  Data mining approaches to materials design
⑤  Preview of part II (tomorrow)
•  tell you about some of my own research
•  show you how can get involved in this field much
faster and easier than ever before!
55
Tomorrow I will:
experiment
computation
•  Funding: DOE Basic Energy Sciences
56
Thank you!
Slides (already) posted to hackingmaterials.lbl.gov

More Related Content

What's hot

Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Anubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Anubhav Jain
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Anubhav Jain
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Anubhav Jain
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Anubhav Jain
 
Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Computational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat CapacityComputational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat CapacityAnubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 

What's hot (20)

Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...Materials design using knowledge from millions of journal articles via natura...
Materials design using knowledge from millions of journal articles via natura...
 
Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Computational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat CapacityComputational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat Capacity
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 

Similar to Introduction (Part I): High-throughput computation and machine learning applied to materials design

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
The Materials Project and computational materials discovery
The Materials Project and computational materials discoveryThe Materials Project and computational materials discovery
The Materials Project and computational materials discoveryAnubhav Jain
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...KAMAL CHOUDHARY
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applicationsaimsnist
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsAnubhav Jain
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignKAMAL CHOUDHARY
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...KAMAL CHOUDHARY
 
Superconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlookSuperconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlookGabriel O'Brien
 
NSF Quantum Leap Poster 2019
NSF Quantum Leap Poster 2019NSF Quantum Leap Poster 2019
NSF Quantum Leap Poster 2019Nathan Frey, PhD
 
Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...myatom
 

Similar to Introduction (Part I): High-throughput computation and machine learning applied to materials design (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project and computational materials discovery
The Materials Project and computational materials discoveryThe Materials Project and computational materials discovery
The Materials Project and computational materials discovery
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
qmms_wines.pptx
qmms_wines.pptxqmms_wines.pptx
qmms_wines.pptx
 
Mat science_lect 1
Mat science_lect 1Mat science_lect 1
Mat science_lect 1
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
 
01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf
 
Superconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlookSuperconducting qubits for quantum information an outlook
Superconducting qubits for quantum information an outlook
 
NSF Quantum Leap Poster 2019
NSF Quantum Leap Poster 2019NSF Quantum Leap Poster 2019
NSF Quantum Leap Poster 2019
 
Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...Advanced Computational Materials Science: Application to Fusion and Generatio...
Advanced Computational Materials Science: Application to Fusion and Generatio...
 

More from Anubhav Jain

Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 

More from Anubhav Jain (20)

Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Introduction (Part I): High-throughput computation and machine learning applied to materials design

  • 1. Introduction (Part I): High-throughput computation and machine learning applied to materials design Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA LLNL Computational Chemistry Materials Science Summer Institute, 2018 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. New materials discovery for devices is difficult •  Novel materials with enhanced performance characteristics could make a big dent in sustainability, scalability, and cost •  In practice, we tend to re-use the same fundamental materials for decades –  solar power w/Si since 1950s –  graphite/LiCoO2 (basis of today’s Li battery electrodes) since 1990 –  Bi2Te3 and PbTe thermoelectrics first studied ~1910 •  Although there are lots of improvements to manufacturing, microstructure, etc., there not many new basic compositions •  Why is discovering better materials such a challenge? 2
  • 3. 3 A material is defined at multiple length scales – stick to the fundamental scale for now
  • 4. 4 A material is defined at multiple length scales – stick to the fundamental scale for now
  • 5. 5 Atoms in a box – the materials universe is huge! •  Bag of 30 atoms •  Each atom is one of 50 elements •  Arrange on 10x10x10 lattice •  Over 10108 possibilities! –  more than grains of sand on all beaches (1021) –  more than number of atoms in universe (1080)
  • 6. 6 Finding the right material is like “finding a needle in a haystack”
  • 7. What constrains traditional experimentation? 7 “[The Chevrel] discovery resulted from a lot of unsuccessful experiments of Mg ions insertion into well-known hosts for Li+ ions insertion, as well as from the thorough literature analysis concerning the possibility of divalent ions intercalation into inorganic materials.” -Aurbach group, on discovery of Chevrel cathode for multivalent (e.g., Mg2+) batteries Levi, Levi, Chasid, Aurbach J. Electroceramics (2009)
  • 8. Can we invent other, faster ways of finding materials? •  The Materials Genome Initiative wants to discover, develop, manufacture, and deploy advanced materials twice as fast at a fraction of the cost •  Strategy involves: –  simulations & supercomputers –  digital data and data mining –  better merging computation and experiment 8 https://obamawhitehouse.archives.gov/mgi
  • 9. Outline 9 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Calculation and property databases ④  Data mining approaches to materials design ⑤  Preview of part II (tomorrow)
  • 10. The basis of density functional theory is quantum mechanics 10 −!2 2m ∇2 Ψ(r)+V (r)Ψ(r) = EΨ(r) Schrödinger equation describes all the properties of a system through the wavefunction: Time-independent, non-relativistic Schrödinger equation
  • 11. •  There aren’t too many real situations where we can get a closed solution to the Schrödinger equation •  Let’s pretend we want to approach things numerically for 1000 electrons –  There are ~500,000 electron-electron interactions to worry about. –  Even storing the wavefunction would take ~101000 GB! •  Discretize the x,y,z, position of each electron into a 1000- element grid = 1 billion positions per electron •  Need the wavefunction output (real + complex part) for each combination of all electron positions, i.e. 1E9 ^ (1000) * 2, or 2E9000 values •  even at 1 byte per wavefunction value (low resolution), you have about 2E1000 GB needed needed to store the wavefunction! 11 The wave function is formidable
  • 12. Maybe Dirac said it best … 12 “The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble.” “It therefore becomes desirable that approximate practical methods of applying quantum mechanics should be developed, which can lead to an explanation of the main features of complex atomic systems without too much computation.”
  • 13. What is density functional theory (DFT)? 13 DFT is a method solve for the electronic structure and energetics of arbitrary materials starting from first-principles. It replaces many-body interactions with a mean field interaction that reproduces the same charge density. In theory, it is exact for the ground state. In practice, accuracy depends on the choice of (some) parameters, the type of material, the property to be studied, and whether the simulated system (crystal) is a good approximation of reality. DFT resulted in the 1999 Nobel Prize for chemistry (W. Kohn). It is responsible for 2 of the top 10 cited papers of all time, across all sciences. e– e– e– e– e– e–
  • 14. How does one use DFT to design new materials? 14 A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  • 15. How accurate is DFT in practice? 15 Shown are typical DFT results for (i) Li battery voltages, (ii) electronic band gaps, and (iii) bulk modulus (i) (ii) (iii) (i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder, Phys. Rev. B 82, 075122 (2010). (ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010). (iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009 (2015). battery voltages band gaps bulk modulus
  • 16. •  System size is essentially limited to a few thousand atoms –  many important materials phenomena simply do not occur at this length scale •  Certain materials, such as those with strong electron correlation, remain difficult to model accurately •  Certain properties, including excited state properties such as band gap, remain difficult to model accurately •  These are all active areas of research and improvement to the theory, and the situation is improving on all fronts 16 Limitations of density functional theory
  • 17. Some limitations of DFT are addressed by other techniques 17 Source: NASA
  • 18. Viewpoint of the DFT accuracy situation •  Improvements to the theory would certainly be very helpful –  Many researchers are working on this problem –  New and better methods do appear over time, e.g., hybrid functionals for solids. •  But – let’s not wait for perfection before we start applying it. 18 The map is not perfect, but time to set sail and leave port!
  • 19. Outline 19 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Calculation and property databases ④  Data mining approaches to materials design ⑤  Preview of part II (tomorrow)
  • 20. 20 From a needle in a haystack to …
  • 21. 21 … hiring an army to search through the haystack
  • 22. High-throughput DFT: a key idea 22 Automate the DFT procedure Supercomputing Power FireWorks Software for programming general computational workflows that can be scaled across large supercomputers. NERSC Supercomputing center, processor count is ~100,000 desktop machines. Other centers are also viable. High-throughput materials screening G. Ceder & K.A. Persson, Scientific American (2015)
  • 23. •  The answer is “it really varies a lot” –  how big / complicated are the materials you are modeling? –  how complex / expensive are the properties you are modeling? •  Ballpark numbers: –  Low range: optimize structure of ~3-atom compounds •  time to do a million materials ~ 10 million CPU-hours –  Medium range: bulk modulus of ~50 atom compounds •  time to do a million materials ~ 2 billion CPU hours •  The largest CPU allocations from the DOE are typically in the order of ~100 million CPU-hours 23 How much computer time is needed for high-throughput DFT?
  • 24. Examples of (early) high-throughput studies 24 Application Researcher Search space Candidates Hit rate Scintillators Klintenberg et al. 22,000 136 1/160 Curtarolo et al. 11,893 ? ? Topological insulators Klintenberg et al. 60,000 17 1/3500 Curtarolo et al. 15,000 28 1/535 High TC superconductors Klintenberg et al. 60,000 139 1/430 Thermoelectrics – ICSD - Half Heusler systems - Half Heusler best ZT Curtarolo et al. 2,500 80,000 80,000 20 75 18 1/125 1/1055 1/4400 1-photon water splitting Jacobsen et al. 19,000 20 1/950 2-photon water splitting Jacobsen et al. 19,000 12 1/1585 Transparent shields Jacobsen et al. 19,000 8 1/2375 Hg adsorbers Bligaard et al. 5,581 14 1/400 HER catalysts Greeley et al. 756 1 1/756* Li ion battery cathodes Ceder et al. 20,000 4 1/5000* Entries marked with * have experimentally verified the candidates. See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
  • 25. Computations predict, experiments confirm 25 Sidorenkite-based Li-ion battery cathodes LED phosphors YCuTe2 thermoelectrics Wang, Z., Ha, J., Kim, Y. H., Im, W. Bin, McKittrick, J. & Ong, S. P. Mining Unexplored Chemistries for Phosphors for High-Color-Quality White-Light-Emitting Diodes. Joule 2, 914–926 (2018). Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang, Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite (Na3MnPO4CO3): A New Intercalation Cathode Material for Na-Ion Batteries, Chem. Mater., 2013 Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs, ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M; Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric Properties of Intrinsically Doped YCuTe2 with CuTe4-based Layered Structure. J. Mat. Chem C, 2016 More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  • 26. •  All the limitations of standard DFT still apply •  How to set DFT parameters automatically? –  A single universal parameter set will not accurately model all materials and all properties –  Different parameter sets for different materials requires deciding how to divide things up, and adds complication of “incompatibility” between calculations •  How to handle non-uniformity of DFT errors when doing meta analyses? •  How to run high-throughput efficiently on large computers? –  The biggest supercomputers are designed for massive parallelization; unfortunately, DFT does not scale well to many processors 26 Limitations of high-throughput DFT
  • 27. Outline 27 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Calculation and property databases ④  Data mining approaches to materials design ⑤  Preview of part II (tomorrow)
  • 28. With HT-DFT, we can generate data rapidly – what to do next? 28 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)!
  • 29. Materials Project database: putting all the data online •  Online resource of density functional theory simulation data for ~85,000 inorganic materials •  Includes band structures, elastic tensors, piezoelectric tensors, battery properties and more •  ~55,000 registered users •  Free •  www.materialsproject.org 29 Jain et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).!
  • 30. The data is re-used by the community 30 K. He, Y. Zhou, P. Gao, L. Wang, N. Pereira, G.G. Amatucci, et al., Sodiation via Heterogeneous Disproportionation in FeF2 Electrodes for Sodium-Ion Batteries., ACS Nano. 8 (2014) 7251–9. M.M. Doeff, J. Cabana, M. Shirpour, Titanate Anodes for Sodium Ion Batteries, J. Inorg. Organomet. Polym. Mater. 24 (2013) 5–14. Further examples in: A. Jain, K.A. Persson, G. Ceder. APL Materials (2016).
  • 31. 31 There are now many first-principles computational databases, including ones not on this list (e.g., NIST-Jarvis, NREL-TEDesignLab) Lin, L. Materials Databases Infrastructure Constructed by First Principles Calculations: A Review. Mater. Perform. Charact. 4, MPC20150014 (2015).
  • 32. •  All the limitations of standard DFT and high- throughput DFT still apply •  Communicating accuracy, limitations, etc. to a diverse user group is difficult •  It remains difficult to merge information from different computational databases –  Citrine Informatics is trying, e.g., www.citrination.com 32 Limitations of DFT databases
  • 33. Outline 33 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Calculation and property databases ④  Data mining approaches to materials design ⑤  Preview of part II (tomorrow)
  • 34. 34 From a needle in a haystack to …
  • 35. 35 … hiring an army to search through the haystack to …
  • 36. 36 Armies with metal detectors
  • 37. 37 Bottom-up vs top-down approach Small number of general principles Large number of specific cases •  Conventional theory starts with a small number of principles and keeps extending / simplifying to tackle more and more cases (growing the theory) •  Data mining starts from a *very* large space of possible models and removes ones that are inconsistent with the data (“trimming” the theories)
  • 38. 38 Early “data mining” (not really machine learning) “Pettifor structure maps” D.G. Pettifor: The structures of binary compound: I. Phenomenological structure maps. J. Phys. C: Solid State Phys. 19, 285– 313 (1986). “Cation coordination numbers” Brown, I. D. What factors determine cation coordination numbers? Acta Crystallogr. Sect. B Struct. Sci. 44, 545–553 (1988).
  • 39. 39 An ML model automatically learns relationships between input variables and output variables The ML model can find nonlinear relationships between descriptors that accurately reproduce / model outputs
  • 40. •  Some people see machine learning as just fancy curve fitting –  See “Big data need big theory too” by Coveney et al. •  I see two things distinguishing ML: –  ML is more flexible; some see it as “writing a program” –  Curve fitting is about good interpolation, whereas a large part of ML is figuring out how to use the data to be predictive or perform a function (play Go) 40 Is ML just curve fitting?
  • 41. 41 Getting a “predictive” fit: 3-tier design Image credit: Joseph Gonzalez, ds100.org Image credit: Adi Bronshtein, Towards Data Science
  • 42. •  Clustering groups together data points by their descriptors – i.e., group “similar points” together •  No output value is needed (unsupervised) •  Many techniques, including hierarchical that shows groupings as a function of cutoff 42 Unsupervised learning: clustering W. Chen, J.-H. Pöhls, G. Hautier, D. Broberg, S. Bajaj, U. Aydemir, Z. M. Gibbs, H. Zhu, M. Asta, G. J. Snyder, B. Meredig, M. A. White, K. Persson, and A. Jain, J. Mater. Chem. C 4, 4414 (2016). clustering thermoelectric materials by similarity
  • 43. •  A more modern approach is autoencoders •  Neural networks are forced to start with a high-dimensional data set and reproduce that information with a few number of dimensions / neuron •  The few dimensions that the neural network finds tend to be very good “descriptors” •  Can then use these descriptors on supervised problems 43 Unsupervised learning: autoencoders Q. V. Le, Proc. 2013 IEEE Int. Conf. Acoust. Speech Signal Process. 8595 (2013).
  • 44. •  Regression techniques can predict output values for new data •  ML commonly uses: –  Lasso, Ridge, ElasticNet – these are all regularization techniques to prevent overfitting –  Kernel Ridge Regression, which can find nonlinear patterns in the data 44 Regression: beyond linear regression K. Hansen, G. Montavon, F. Biegler, S. Fazli, M. Rupp, M. Scheffler, O. A. Von Lilienfeld, A. Tkatchenko, and K. R. Müller, J. Chem. Theory Comput. 9, 3404 (2013).
  • 45. •  Tree-based techniques make a series of “decisions” based on input features •  These decisions are designed to split the data into homogeneous groups •  At the end, the data should be homogeneous enough to predict a single value •  Pro: highly interpretable Cons: usually not very accurate, gives discontinuous predictions 45 Regression: tree-based techniques (1) J. Carrete, N. Mingo, S. Wang, and S. Curtarolo, Adv. Funct. Mater. 24, 7427 (2014).
  • 46. •  Random forests train multiple trees each with slightly different information (different subset of input data and features) and average or vote on the results •  Random forests tend to be a good “starter” model to see roughly how good ML will do •  Gradient boosted trees go one step further 46 Regression: tree-based techniques (2) –random forest
  • 47. •  These systems try to guess the next best point to try for an application given the existing data •  The next choice might try to maximize your output, give highest chance of some improvement, or might instead be tuned to give the maximum increase in knowledge about the problem 47 Recommendation engines and adaptive learning D. Xue, P. V. Balachandran, J. Hogden, J. Theiler, D. Xue, and T. Lookman, Nat. Commun. 7, 11241 (2016).
  • 48. •  Neural networks are one of the hottest topics in ML at the moment •  Problem: *many* tunable parameters, maybe a billion to train •  If you try to train a deep neural network with 1000 data points, you might be fooling yourself •  What to do? –  identify problems with a lot of data –  use “transfer learning” –  wait for more data … 48 Neural networks Image credit: kdnuggets.com
  • 49. 49 ML and “creativity” – inventing fake celebrities None of these are real people! They are all fakes from a neural net (GAN). Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive Growing of GANs for Improved Quality, Stability, and Variation. 1–26 (2017). doi:10.1002/joe.20070
  • 50. •  e.g., neural net generated thesis titles –  Reconstruction of metal-to-motion : construction of plasma emissions –  Computational approaches for distributed microscopy –  Optical effects on virtual radio Projects –  Experiments, and protein games from multiple atoms –  Hydrogel wireless charging via nanoparticle education –  Supersone legal questions support kits regulation on qubits and transportation –  Atoms and characteristics of monolithic nanocity 50 Amateur fun with AI “creativity” at http://aiweirdness.com
  • 51. •  No underlying physical constraints on the model –  the machine doesn’t know whether it’s modeling baseball statistics, physics of particle trajectories, or housing prices •  Hard to know how much to trust an ML result –  Uncertainty can be built into the model, but it’s not clear that these are all that meaningful –  Cross-validation estimates of accuracy is a “gold standard”, but has many pitfalls, e.g. out of sample errors •  Almost always a tradeoff between accuracy and interpretability 51 Limitations and pitfalls of machine learning
  • 52. 52 ML behaves in ways that are often not well understood We still do not really understand how many of these models really “work” They can often give wrong results, with a high degree of confidence(!), that are very different than our own intuition Nguyen, A., Yosinski, J. & Clune, J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 07–12–June, 427–436 (2015).
  • 53. •  Small data sets – typically a few thousand (not millions) – which is a major limitation for data-driven methods •  Materials scientists are typically looking to predict outliers, not typical examples –  e.g., we are not looking for materials that behave essentially like other materials the way ML looks for “images that look like typical faces” –  Instead, we are looking for “outlier” materials that behave differently than known materials, i.e., like nothing in our dataset •  A material is a complex object to describe to a computer 53 There are many ML challenges particular to materials science
  • 54. Outline 54 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Calculation and property databases ④  Data mining approaches to materials design ⑤  Preview of part II (tomorrow)
  • 55. •  tell you about some of my own research •  show you how can get involved in this field much faster and easier than ever before! 55 Tomorrow I will: experiment computation
  • 56. •  Funding: DOE Basic Energy Sciences 56 Thank you! Slides (already) posted to hackingmaterials.lbl.gov