SlideShare a Scribd company logo
1 of 43
Download to read offline
Introduction to
Data Science in
Materials Science
Shyue Ping Ong
What is Data Science?
Data science is a multi-disciplinary field that uses
scientific methods, processes, algorithms and
systems to extract knowledge and insights from
structured and unstructured data.
NANO281
NANO281
Domain
Knowledge
Computer
Science
Mathematics
Data
Science
Machine
Learning
Data
Processing
Statistical
Analysis
We are now living in the Data age…
NANO281
Materials data is growing … (stats as of Jan 1
2020)
NANO281
~ 200,000 crystals
~ 400,000 crystals
Cambridge structural
database (small-molecule
organic and metal-
organic crystal structures)
since 1972…
Source: https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/
http://cdn.rcsb.org/rcsb-pdb/v2/about-us/rcsb-pdb-impact.pdf
Protein Data Bank (PDB)
But quantity and quality lags many other fields….
NANO281
https://supercon.nims.go.jp/
~1000+ superconductors
(many minor composition
modifications)
One of the most comprehensive handbooks on
materials data:
• Density, thermal and electrical conductivity,
melting and boiling points, etc.
• But O(100) binaries and limited ternaries
“First Principles” Materials Design
Eψ(r) = −
h 2
2m
∇2
ψ(r)+V(r)ψ(r)
Schrodinger Equation
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
Diffusion coordinate
Energy(meV)
LCO
NCO
Material Properties
Phase stability1
Diffusion barriers2
Charge
densities6
Surface energies
and Wulff shape3
Density functional theory
(DFT) approximation
Generally applicable to
any chemistry
1 Ong et al., Chem. Mater., 2008, 20, 1798–1807.
2 Ong et al., Energy Environ. Sci., 2011, 4, 3680–3688.
3 Tran et al., Sci. Data, 2016, 3, 160080.
4 Deng et al., J. Electrochem. Soc., 2016, 163, A67–A74.
5 Wang et al., Chem. Mater., 2016, 28, 4024–4031.
6 Ong et al., Phys. Rev. B, 2012, 85, 2–5.
Mechanical
properties4
Electronic
structure5
Inherently
scalable
NANO281
Electronic structure calculations are today
reliable and reasonably accurate.
tials in Quantum ESPRESSO). In this case, too,
the small D values indicate a good agreement
between codes. This agreementmoreoverencom-
passes varying degrees of numerical convergence,
differences in the numerical implementation of
the particular potentials, and computational dif-
ferences beyond the pseudization scheme, most
of which are expected to be of the same order of
magnitude or smaller than the differences among
all-electron codes (1 meV per atom at most).
Conclusions and outlook
Solid-state DFT codes have evolved considerably.
The change from small and personalized codes to
widespread general-purpose packages has pushed
developers to aim for the best possible precision.
Whereas past DFT-PBE literature on the lattice
parameter of silicon indicated a spread of 0.05 Å,
the most recent versions of the implementations
discussed here agree on this value within 0.01 Å
(Fig. 1 and tables S3 to S42). By comparing codes
on a more detailed level using the D gauge, we
have found the most recent methods to yield
nearly indistinguishable EOS, with the associ-
ated error bar comparable to that between dif-
ferent high-precision experiments. This underpins
thevalidityof recentDFTEOSresults andconfirms
that correctly converged calculations yield reliable
predictions. The implications are moreover rele-
vant throughout the multidisciplinary set of fields
that build upon DFT results, ranging from the
physical to the biological sciences.
In spite of the absence of one absolute refer-
ence code, we were able to improve and demon-
strate the reproducibility of DFT results by means
of a pairwise comparison of a wide range of codes
and methods. It is now possible to verify whether
any newly developed methodology can reach the
same precision described here, and new DFT
applications can be shown to have used a meth-
od and/or potentials that were screened in this
way. The data generated in this study serve as a
crucial enabler for such a reproducibility-driven
paradigm shift, and future updates of available
D values will be presented at http://molmod.
ugent.be/deltacodesdft. The reproducibility of
reported results also provides a sound basis for
further improvement to the accuracy of DFT,
particularly in the investigation of new DFT func-
tionals, or for the development of new computa-
tional approaches. This work might therefore
Fig. 4. D values for comparisons between the most important DFT methods considered (in
millielectron volts per atom). Shown are comparisons of all-electron (AE), PAW, ultrasoft (USPP), and
norm-conserving pseudopotential (NCPP) results with all-electron results (methods are listed in alpha-
betical order in each category). The labels for each method stand for code, code/specification (AE), or
potential set/code (PAW, USPP, and NCPP) and are explained in full in tables S3 to S42.The color coding
RESEARCH | RESEARCH ARTICLE
onFebruary19,2017http://science.sciencemag.org/Downloadedfrom
Lejaeghere et al. Science, 2016, 351 (6280), aad3000.
Nitrides are an important class of optoel
ported synthesizability of highly metasta
nitrogen precursors (36, 37) suggests th
spectrum of promising and technologica
trides awaiting discovery.
Although our study focuses on the m
crystals, polymorphism and metastability
is of great technological relevance to pha
tronics, and protein folding (7). Our obs
energy to metastability could address a d
in organic molecular solids: Why do man
numerous polymorphs within a small (~
whereas inorganic solids often see >100°C
morph transition temperatures? The wea
molecular solids yield cohesive energies o
or −1 eV per molecule, about a third of t
class of inorganic solids (iodides; Fig. 2B).
yields a correspondingly small energy scal
(38). When this small energy scale of orga
is coupled with the rich structural diversity a
tional degrees of freedom during molecular
leads to a wide range of accessible polymorp
modynamic conditions.
Influence of composition
The space of metastable compounds hov
scape of equilibrium phases. As chemica
thermodynamic system, the complexity
grows. Figure 2A shows an example ca
for the ternary Fe-Al-O system, plotted a
tion energies referenced to the elemental
S1.2 for discussion). We anticipate the th
of a phase to be different when it is compe
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
HAUTIER, ONG, JAIN, MOORE, AND CEDER PHYSICAL REVIEW B 85, 155208 (2012)
or meV/atom); 10 meV/atom corresponds to about 1 kJ/mol-
atom.
III. RESULTS
Figure 2 plots the experimental reaction energies as a
function of the computed reaction energies. All reactions
involve binary oxides to ternary oxides and have been chosen
as presented in Sec. II. The error bars indicate the experimental
error on the reaction energy. The data points follow roughly
the diagonal and no computed reaction energy deviates from
the experimental data by more than 150 meV/atom. Figure 2
does not show any systematic increase in the DFT error with
larger reaction energies. This justifies our focus in this study
on absolute and not relative errors.
In Fig. 3, we plot a histogram of the difference between
the DFT and experimental reaction energies. GGA + U un-
derestimates and overestimates the energy of reaction with the
same frequency, and the mean difference between computed
and experimental energies is 9.6 meV/atom. The root-mean-
square (rms) deviation of the computed energies with respect
to experiments is 34.4 meV/atom. Both the mean and rms are
very different from the results obtained by Lany on reaction
energies from the elements.52
Using pure GGA, Lany found
that elemental formation energies are underestimated by GGA
with a much larger rms of 240 meV/atom. Our results are
closer to experiments because of the greater accuracy of DFT
when comparing chemically similar compounds such as binary
and ternary oxides due to errors cancellation.40
We should note
that even using elemental energies that are fitted to minimize
the error versus experiment in a large set of reactions, Lany
reports that the error is still 70 meV/atom and much larger
than what we find for the relevant reaction energies. The
rms we found is consistent with the error of 3 kJ/mol-atom
600
800
l
V/at)
FIG. 3. (Color online) Histogram of the difference between
computed ( E
comp
0 K ) and experimental ( E
expt
0 K ) energies of reaction
(in meV/atom).
(30 meV/atom) for reaction energies from the binaries in the
limited set of perovskites reported by Martinez et al.29
Very often, instead of the exact reaction energy, one is
interested in knowing if a ternary compound is stable enough
to form with respect to the binaries. This is typically the case
when a new ternary oxide phase is proposed and tested for
stability versus the competing binary phases.18
From the 131
compounds for which reaction energies are negative according
to experiments, all but two (Al2SiO5 and CeAlO3) are also
negative according to computations. This success in predicting
stability versus binary oxides of known ternary oxides can
be related to the very large magnitude of reaction energies
from binary to ternary oxides compared to the typical errors
observed (rms of 34 meV/atom). Indeed, for the vast majority
of the reactions (109 among 131), the experimental reaction en-
ergies are larger than 50 meV/atom. It is unlikely then that the
DFT error would be large enough to offset this large reaction
energy and make a stable compound unstable versus the binary
oxides.
The histogram in Fig. 3 shows several reaction energies
with significant errors. Failures and successes of DFT are often
JSON document in the format of a Crystallographic Information File (cif), which can also be downloaded
via the Materials Project website and Crystalium web application. In addition, the weighted surface
energy (equation (2)), shape factor (equation (3)), and surface anisotropy (equation (4)) are given.
Table 2 provides a full description of all properties available in each entry as well as their corresponding
JSON key.
Technical Validation
The data was validated through an extensive comparison with surface energies from experiments and
other DFT studies in the literature. Due to limitations in the available literature, only the data on ground
state phases were compared.
Comparison to experimental measurements
Experimental determination of surface energy typically involves measuring the liquid surface tension and
solid-liquid interfacial energy of the material20
to estimate the solid surface energy at the melting
temperature, which is then extrapolated to 0 K under isotropic approximations. Surface energies for
individual crystal facets are rarely available experimentally. Figure 5 compares the weighted surface
energies of all crystals (equation (2)) to experimental values in the literature20,23,26–28
. It should be noted
that we have adopted the latest experimental values available for comparison, i.e., values were obtained
from the 2016 review by Mills et al.27
, followed by Keene28
, and finally Niessen et al.26
and Miller and
Tyson20
. A one-factor linear regression line γDFT
¼ γEXP
þ c was fitted for the data points. The choice of
the one factor fit is motivated by the fact that standard broken bond models show that there is a direct
relationship between surface energies and cohesive energies, and previous studies have found no evidence
that DFT errors in the cohesive energy scale with the magnitude of the cohesive energy itself61
.
We find that the DFT weighted surface energies are in excellent agreement with experimental values,
with an average underestimation of only 0.01 J m− 2
and a standard error of the estimate (SEE) of
0.27 J m− 2
. The Pearson correlation coefficient r is 0.966. Crystals with surfaces that are well-known to
undergo significant reconstruction tend to have errors in weighted surface energies that are larger than
the SEE.
The differences between the calculated and experimental surface energies can be attributed to three
main factors. First, there are uncertainties in the experimental surface energies. The experimental values
derived by Miller and Tyson20
are extrapolations from extreme temperatures beyond the melting point.
The surface energy of Ge, Si62
, Te63
, and Se64
were determined at 77, 77, 432 and 313 K respectively while
Figure 5. Comparison to experimental surface energies. Plot of experimental versus calculated weighted
surface energies for ground-state elemental crystals. Structures known to reconstruct have blue data points
while square data points correspond to non-metals. Points that are within the standard error of the estimate
− 2
Phase stability Formation energies
Tran, et al. Sci. Data 2016, 3, 160080.
Sun, et al. Sci. Adv. 2016, 2 (11),
e1600225.
Figure 2. Distribution of calculated volume per atom, Poisson ratio, bulk modulus and shear modulus. Vector
field-plot showing the distribution of the bulk and shear modulus, Poisson ratio and atomic volume for 1,181
metals, compounds and non-metals. Arrows pointing at 12 o’clock correspond to minimum volume-per-atom
and move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 o’clock. Bar
plots indicate the distribution of materials in terms of their shear and bulk moduli.
www.nature.com/sdata/
Surface energies Elastic constants
de Jong et al. Sci. Data 2015, 2,
150009.
Hautier et al. Phys. Rev. B 2012, 85,
155208.
NANO281
Modern
electronic
structure
codes give
relatively
consistent
equations of
state.
Software frameworks for HT electronic structure
computations
Atomic Simulation Environment
https://wiki.fysik.dtu.dk/ase
Materials Project1
https://www.materialsproject.org
Custodianhttp://aflowlib.org
http://www.aiida.net
1 Jain et al. APL Mater. 2013, 1 (1), 11002.
2 Ong et al. Comput. Mater. Sci. 2013, 68, 314–319.
3 Jain et al. Concurr. Comput. Pract. Exp. 2015, 27 (17), 5037–
5059.
2
3
NANO281
Computation +Automation -> Large databases
Jain, et al. , APL Mater., 2013, 1, 11002.
NANO281
The Materials Project is an open science
project to make the computed properties of all
known inorganic materials publicly available to
all researchers to accelerate materials
innovation.
June 2011: Materials Genome Initiative
which aims to “fund computational tools,
software, new methods for material
characterization, and the development of
open standards and databases that will make
the process of discovery and development
of advanced materials faster, less
expensive, and more predictable”
https://www.materialsproject.org
NANO281
NANO281
“Google” of Materials
1 Jain et al. APL Mater. 2013, 1 (1), 11002. .
Structure
Electronic
Structure
Elastic
properties
XRD
Energetic
properties
Materials
Project DB
How do I
access MP
data?
MaterialsAPI
Pros
• Intuitive and user-friendly
• Secure
WebApps
RESTfulAPI
• Programmatic access for
developers and researchers
NANO281
The Materials API
An open platform for accessing Materials
Project data based on REpresentational
State Transfer (REST) principles.
Flexible and scalable to cater to large
number of users, with different access
privileges.
Simple to use and code agnostic.
NANO281
A REST API maps a URL to a
resource.
Example:
GET https://api.dropbox.com/1/account/info
Returns information about a user’s account.
Methods: GET, POST, PUT, DELETE, etc.
Response: Usually JSON or XML or both
NANO281
Who implements REST
APIs?
NANO281
NANO281
https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy
Preamble
Identifier, typically a
formula (Fe2O3), id
(1234) or chemical
system (Li-Fe-O)
Data type
(vasp, exp,
etc.)
Property
Request
type
NANO281
Secure access
An individual API key provides secure
access with defined privileges.
All https requests must supply API key
as either a “x-api-key” header or a
GET/POST “API_KEY” parameter.
API key available at
https://www.materialsproject.org/dashbo
ard
NANO281
Sample output (JSON)
Intuitive response
format
Machine-readable
(JSON parsers
available for most
programming
languages)
Metadata provides
provenance for
tracking
crea ed_a : "2014-07-18T11:23:25.415382",
alid_response: r e,
ersion:
,
-
p ma gen: "2.9.9",
db: "2014.04.18",
res : "1.0"
response: [
],
-
,
-
energ : -67.16532048,
ma erial_id: "mp-24972"
,
-
energ : -132.33035197,
ma erial_id: "mp-542309"
,+
,+
,+
,+
,+
,+
,+
+
cop righ : "Ma erial Projec , 2012"
NANO281
Demo of Materials Data Sources
NANO281
https://docs.google.com/spreadsheets/d/18MPVaixzX7hQN6lT0n9-
FdmTnTYjQDkvRhI9T5Ym1jQ/edit?usp=sharing
Types of Materials Data
Qualitative
data
Nominal
measurement
(categories)
E.g., Metal/Insulator,
Stable/Unstable
No rank or order
Ranked
data
Ordinal
measurement
(ordered)
E.g., Insulator/
semiconductor/
conductor
Does not indicate
distance between
ranks
Quantitative
Data
Interval/ratio
measurement (equal
intervals and true 0)
E.g., melting point,
elastic constant,
electrical/ionic
conductivity
Considerable
information and
permits meaningful
arithmetic operations
NANO281
Machine learning (ML) is nothing more than
(highly) sophisticated curve fitting….
NANO281
Image: https://www.slideshare.net/awahid/big-data-and-machine-learning-for-businesses and Google
Images
Typical Materials Data ScienceWorkflow
Identify Purpose
and Target
Data Collection Featurization Training Application
Active learning
Domain knowledge
- Is target learnable?
- Is target ambiguous?
Data Sources
Existing DIY
Elemental Features Structural Features
Classification
Decision tree
Logistic regression
...
Regression
GPR
KRR
Multi-linear
Random forest
SVR
Neural networks
Graph models
...
Supervised
- Cross-validation
- Hyper-parameter optimization
Tools
ænet
Automatminer
CGCNN
DeepChem
MEGNet
PROPhet
SchnetPack
TensorMol
...
NANO281
Where is ML valuable in Materials Science?
NANO281
Things that are too slow/difficult
to compute
Relationships that are beyond our
understanding (at the moment)
(AA’)0.5(BB’)0.5O3 perovskite
10 A and 10 B species
= (10C2 x 8C4)2 ≈107
Element-wise classification model
Prediction
PredictedInput
CN4 - Motif 1
CN5 - Mo
CN6
CN4
1: single bond 2: L-shaped 2: water-like
2: bent 120
degrees
2: bent 150
degrees
2: linear 3: T-shaped 3: trigonal planar
3: trigonal non-
coplanar
4: square co-
planar
4: tetrahedral
4: rectangular
see-saw
4: see-saw like
4: trigonal
pyramidal
5: pentagonal
planar
5: square
pyramidal
5: trigonal
bipyramidal
6: hexagonal
planar
6: octahedral
6: pentagonal
pyramidal
7: hexagonal
pyramidal
7: pentagonal
bipyramidal
8: body-centered
cubic
8: hexagonal
bipyramidal
12: cuboctahedra
??
Data History of the Materials Project
Reasonable ML
Deep learning
(AA’)0.5(BB’)0.5O3
perovskite
2 x 2 x 2 supercell,
10 A and 10 B species
= (10C2 x 8C4)2 ≈107
NANO281
ratio of (634 + 34)/485 ≈ 1.38 (Supplementary Table S-II) with b5%
difference in the experimental and theoretical values. This again
agree well with those calculated from the rule of mixture (Supplemen-
tary Table-III). The experimental XRD patterns also agree well with
Fig. 2. Atomic-resolution STEM ABF and HAADF images of a representative high-entropy perovskite oxide, Sr(Zr0.2Sn0.2Ti0.2Hf0.2Mn0.2)O3. (a, c) ABF and (b, d) HAADF images at (a, b) low
and (c, d) high magnifications showing nanoscale compositional homogeneity and atomic structure. The [001] zone axis and two perpendicular atomic planes (110) and (110) are marked.
Insets are averaged STEM images.
Jiang et al. A New Class
of High-Entropy
Perovskite Oxides.
Scripta Materialia 2018,
142, 116–120.
Materials design
is combinatorial
Solution:Surrogate models for“instant” property
predictions
NANO281
“descriptors/features”“target"
Property
• Energies (formation,
Ehull, reaction, binding,
etc.)
• Band gaps
• Mechanical properties
• Functional properties
(e.g., ionic conductivity)
• ….
Composition
• Stoichiometric
attributes, e.g., number
and ratio of elements,
etc.
• Elemental property,
e.g., mean, range, min,
max, etc. of elemental
properties such as
atomic number,
electronegativity, row,
group, atomic radii, etc.
• Electronic structure,
e.g., number of valence
electrons, shells, etc.
• …
Structure
• Crystal/molecular
symmetry
• Lattice parameters
• Atomic coordinates
• Connectivity / bonding
between atoms
• …
= f( ),
Composition-based models
NANO281
Zheng, X., et al (2018). Chem. Sci., 9(44), 8426-8432.
Jha et al. (2018) Sci. Rep., 8(1), 17593.
Meredig et al. (2014) Phys. Rev. B 89, 094104
Feature engineering Deep Learning
Structure-based models
NANO281
Property-labelled materials fragments + gradient boosting decision tree
Isayev et al. (2017) Nature Comm., 8, 15679 Xie et al. (2018) Phys. Rev. Lett. 120, 145301
Crystal graph + graph convolutional neural networks
Smooth overlap of atom positions (SOAP)
Rosenbrock et al. npj Comput. Mater. (2017), 3, 29
State of the art:Graph-based representations
Figure 4: Pearson correlations between elemental embedding vectors. Elements are arranged
in order of increasing Mendeleev number49
for easier visualization of trends.
Nissan Motor Co.NANO281
Performance on 130,462 QM9 molecules
NANO281
80%-10%-10%
train-validation-test split
Only Z as atomic feature,
i.e., feature selection helps
model learn, but is not
critical!
MEGNET1 MEGNET-
Simple1
SchNet2 “Chemical
Accuracy”
U0 (meV) 9 12 14 43
G (meV) 10 12 14 43
εHOMO (eV) 0.038 0.043 0.041 0.043
εLUMO (eV) 0.031 0.044 0.034 0.043
Cv (cal/molK) 0.030 0.029 0.033 0.05
1 Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294.
2 Schutt et al. J. Chem. Phys. 148, 241722 (2018)
State-of-the-art performance
surpassing chemical accuracy
in 11 of 13 properties!
Performance on Materials Project Crystals
NANO281
Property MEGNet SchNet1 CGCNN2
Formation energy Ef
(meV/atom)
28
(60,000)
35 39
(28,046)
Band gap Eg (eV) 0.330
(36,720)
- 0.388
(16,485)
log10 KVRH (GPa) 0.050
(4,664)
- 0.054
(2,041)
log10 GVRH (GPa) 0.079
(4,664)
- 0.087
(2,041)
Metal classifier 78.9%
(55,391)
- 80%
(28,046)
Non-metal classifier 90.6%
(55,391)
- 95%
(28,046)
1 Schutt et al. J. Chem. Phys. 148, 241722 (2018)
2 Xie et al. PRL. 120.14 (2018): 145301.
The Scale Challenge in Computational Materials
Science
Many real-world materials problems are not
related to bulk crystals.
Huang et al. ACS Energy Lett. 2018, 3 (12), 2983–
2988.
Tang et al. Chem. Mater. 2018, 30 (1), 163–173.
Electrode-electrolyte interfaces Catalysis Microstructure and segregation
Need linear-scaling with ab initio accuracy.
NANO281
Machine Learning: A solution to the Scale Challenge
in Computational Materials Science?
Length Scale
AccuracyTransferability
Finite element /
continuum
models
Empirical
potentials
First principles methods
Critical challenge
Bridging the 10-10 → 10-6
m or 10-12 → 10-6 sec
scales in a manner that
retains transferability
and accuracy, and is
scalable.
Time Scale
Atomic vibrations<ps
ns
µs
Ion dynamics
Reaction dynamics
ms
NANO281
Machine learning the potential energy surface
NANO281
symmetry functions (ACSF)39
to represent the atomic local environments and fully con-
nected neural networks to describe the PES with respect to symmetry functions.11,12
A separate neural network is used for each atom. The neural network is defined by
the number of hidden layers and the nodes in each layer, while the descriptor space is
given by the following symmetry functions:
Gatom,rad
i =
NatomX
j6=i
e ⌘(Rij Rs)2
· fc(Rij),
Gatom,ang
i = 21 ⇣
NatomX
j,k6=i
(1 + cos ✓ijk)⇣
· e ⌘0(R2
ij+R2
ik+R2
jk)
· fc(Rij) · fc(Rik) · fc(Rjk),
where Rij is the distance between atom i and neighbor atom j, ⌘ is the width of the
Gaussian and Rs is the position shift over all neighboring atoms within the cuto↵
radius Rc, ⌘0
is the width of the Gaussian basis and ⇣ controls the angular resolution.
fc(Rij) is a cuto↵ function, defined as follows:
fc(Rij) =
8
>><
>>:
0.5 · [cos (
⇡Rij
Rc
) + 1], for Rij  Rc
0.0, for Rij > Rc.
These hyperparameters were optimized to minimize the mean absolute errors of en-
ergies and forces for each chemistry. The NNP model has shown great performance
for Si,11
TiO2,40
water41
and solid-liquid interfaces,42
metal-organic frameworks,43
and
has been extended to incorporate long-range electrostatics for ionic systems such as
4
nO44
and Li3PO4.45
aussian Approximation Potential (GAP). The GAP calculates the similar-
y between atomic configurations based on a smooth-overlap of atomic positions
OAP)10,46
kernel, which is then used in a Gaussian process model. In SOAP, the
aussian-smeared atomic neighbor densities ⇢i(R) are expanded in spherical harmonics
follows:
⇢i(R) =
X
j
fc(Rij) · exp(
|R Rij|2
2 2
atom
) =
X
nlm
cnlm gn(R)Ylm( ˆR),
he spherical power spectrum vector, which is in turn the square of expansion coe -
ents,
pn1n2l(Ri) =
lX
m= l
c⇤
n1lmcn2lm,
n be used to construct the SOAP kernel while raised to a positive integer power ⇣
hich is 4 in present case) to accentuate the sensitivity of the kernel,10
K(R, R0
) =
X
n1n2l
(pn1n2l(R)pn1n2l(R0
))⇣
,
the above equations, atom is a smoothness controlling the Gaussian smearing, and
Distances and angles
Neighbor density
Linear regression
Kernel regression
Neural networks
Neural Network Potential (NNP)1
Moment Tensor Potential (MTP)2
Gaussian Approximation Potential (GAP)3
Spectral Neighbor Analysis Potential (SNAP)4
ML models
Descriptors
1 Behler et al. PRL. 98.14 (2007): 146401.
2 Shapeev MultiScale Modeling and Simulation 14, (2016).
3 Bart ́ok et al. PRL. 104.13 (2010): 136403.
4 Thompson et al. J. Chem. Phys. 285, 316330 (2015)
Standardized workflow for ML-IAP construction
and evaluation
Pymatgen
Fireworks + VASP
DFT static
Dataset
Elastic deformation Distorted
structures
Surface generation Surface
structures
Vacancy + AIMD Trajectory
snapshots
(low T, high T) AIMD Trajectory
snapshots
Crystal
structure
property fittingE
e
e.g. elastic, phonon
···
energy weights
degrees of freedom
···
cutoff radius
expansion width
S1
S2
Sn
· · ·
rc
atomic descriptors
local
environment
sites
· · · · · ·
X1(r1j … r1n)
X2(r2k … r2m)
Xn(rnj … rnm)
machine learning
Y =f(X; !)
Y (energy, force, stress)
DFT properties
grid search
evolutionary algorithm
NANO281
Available open source on Github: https://github.com/materialsvirtuallab/mlearn
Zuo, Y.; Chen, C.; Li, X.; Deng, Z.; Chen, Y.; Behler, J.; Csányi, G.; Shapeev, A. V.; Thompson, A. P.; Wood, M. A.; et al. A Performance and Cost
Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.
Ni-Mo SNAP performance
NANO281
q SNAP significantly outperforms in binary and
bcc Mo for energy and elastic constants.
Energy
Forces
Elasticconstants
Ni-Mo phase diagram
NANO281
EAM completely fails to
reproduce Ni-Mo phase
diagram
Ni3Mo
Ni4Mo
Solid-liquid equilibrium
Application:Investigating Hall-Petch strengthening
in Ni-Mo
NANO281
q ~20,000 to ~455,000 atoms
q Uniaxially strained with a strain rate of 5×108 s-
1
q SNAP reproduces the Hall-Petch relationship,
consistent with experiment[1].[1] Hu et al. Nature, 2017, 355, 1292
ML-IAP:Accuracy vs Cost
NANO281
Testerror(meV/atom)
Computational cost s/(MD step atom)
a
b
Jmax = 3
Jmax = 3
2000 kernels20 polynomial powers
hidden layers [16, 16]
Mo dataset
Zuo et al. A Performance and Cost Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.
Modeling relationships that are too complex to
understand right now….
Oct 10 2019
What’s the equivalent of
this problem in materials
characterization?
Identify
Absorption
Species
Learner N
Peak Shifting /
Alignment
Spectra Norm.
(Optional)
Feature Trans.
Intensity Norm.
Similarity
Measure
…
Learner 1
Peak Shifting /
Alignment
Spectra Norm.
(Optional)
Feature Trans.
Intensity Norm.
Similarity
Measure
Rank 1 Rank N
Combined
Rank
Prob. Each
Spectrum
Database
Zheng et al. Automated generation and
ensemble-learned matching of X-ray
absorption spectra. npj Comput. Mater.
2018, 4 (1), 12
500,000 computed
K-edge XANES of >
50,000 crystals (In
progress: L edges
and EXAFS)
~84% accuracy in
identifying correct
oxidation state and
coordination
environment!
Random Forest Coordination Environment
Classification
Oct 10 2019
Alkali
TM
Post-TM
Metalloid
Carbon
Alkaline
Other examples
NANO281
Oviedo et al. (2019) npj Comput. Mater. 5, 60.
Classification of crystal structures from XRD
Schütt et al. (2019) Nature Comm. 10, 5024
Prediction of wavefunctions

More Related Content

What's hot

Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Computational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat CapacityComputational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat CapacityAnubhav Jain
 
Kinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ Nanophosphor
Kinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ NanophosphorKinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ Nanophosphor
Kinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ NanophosphorIJLT EMAS
 
Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11
Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11
Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11Alexander Decker
 
Electron Diffusion and Phonon Drag Thermopower in Silicon Nanowires
Electron Diffusion and Phonon Drag Thermopower in Silicon NanowiresElectron Diffusion and Phonon Drag Thermopower in Silicon Nanowires
Electron Diffusion and Phonon Drag Thermopower in Silicon NanowiresAI Publications
 
BIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsBIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsbios203
 
Electronic structure of strongly correlated materials
Electronic structure of strongly correlated materialsElectronic structure of strongly correlated materials
Electronic structure of strongly correlated materialsABDERRAHMANE REGGAD
 
Electronic structure of strongly correlated materials Part II V.Anisimov
Electronic structure of strongly correlated materials Part II V.AnisimovElectronic structure of strongly correlated materials Part II V.Anisimov
Electronic structure of strongly correlated materials Part II V.AnisimovABDERRAHMANE REGGAD
 
Gw renormalization of the electron phonon coupling
Gw renormalization of the electron phonon couplingGw renormalization of the electron phonon coupling
Gw renormalization of the electron phonon couplingClaudio Attaccalite
 
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...ABDERRAHMANE REGGAD
 
Simulation of AlGaN/Si and InN/Si ELECTRIC –DEVICES
Simulation of AlGaN/Si and InN/Si ELECTRIC –DEVICESSimulation of AlGaN/Si and InN/Si ELECTRIC –DEVICES
Simulation of AlGaN/Si and InN/Si ELECTRIC –DEVICESijrap
 
Impact of electronic correlation on the electron-phonon coupling
Impact of electronic correlation on the electron-phonon couplingImpact of electronic correlation on the electron-phonon coupling
Impact of electronic correlation on the electron-phonon couplingClaudio Attaccalite
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional TheoryWesley Chen
 
Dielectrics in a time-dependent electric field: density-polarization functi...
Dielectrics in a time-dependent electric field:   density-polarization functi...Dielectrics in a time-dependent electric field:   density-polarization functi...
Dielectrics in a time-dependent electric field: density-polarization functi...Claudio Attaccalite
 
10.637 Lecture 1: Introduction
10.637 Lecture 1: Introduction10.637 Lecture 1: Introduction
10.637 Lecture 1: IntroductionHeather Kulik
 

What's hot (19)

First principles design of lithium superionic conductors
First principles design of lithium superionic conductorsFirst principles design of lithium superionic conductors
First principles design of lithium superionic conductors
 
NANO266 - Lecture 4 - Introduction to DFT
NANO266 - Lecture 4 - Introduction to DFTNANO266 - Lecture 4 - Introduction to DFT
NANO266 - Lecture 4 - Introduction to DFT
 
NANO266 - Lecture 14 - Transition state modeling
NANO266 - Lecture 14 - Transition state modelingNANO266 - Lecture 14 - Transition state modeling
NANO266 - Lecture 14 - Transition state modeling
 
Ab initio md
Ab initio mdAb initio md
Ab initio md
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Computational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat CapacityComputational Discovery of Thermal Fluids with Enhanced Heat Capacity
Computational Discovery of Thermal Fluids with Enhanced Heat Capacity
 
Kinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ Nanophosphor
Kinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ NanophosphorKinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ Nanophosphor
Kinetic Analysis of TL Spectrum of ϒ-IrradiatedSrAl2O4:Eu2+, Dy3+ Nanophosphor
 
Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11
Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11
Lattice dynamics and normal coordinate analysis of htsc tl ca3ba2cu4o11
 
Electron Diffusion and Phonon Drag Thermopower in Silicon Nanowires
Electron Diffusion and Phonon Drag Thermopower in Silicon NanowiresElectron Diffusion and Phonon Drag Thermopower in Silicon Nanowires
Electron Diffusion and Phonon Drag Thermopower in Silicon Nanowires
 
BIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamicsBIOS 203 Lecture 4: Ab initio molecular dynamics
BIOS 203 Lecture 4: Ab initio molecular dynamics
 
Electronic structure of strongly correlated materials
Electronic structure of strongly correlated materialsElectronic structure of strongly correlated materials
Electronic structure of strongly correlated materials
 
Electronic structure of strongly correlated materials Part II V.Anisimov
Electronic structure of strongly correlated materials Part II V.AnisimovElectronic structure of strongly correlated materials Part II V.Anisimov
Electronic structure of strongly correlated materials Part II V.Anisimov
 
Gw renormalization of the electron phonon coupling
Gw renormalization of the electron phonon couplingGw renormalization of the electron phonon coupling
Gw renormalization of the electron phonon coupling
 
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
 
Simulation of AlGaN/Si and InN/Si ELECTRIC –DEVICES
Simulation of AlGaN/Si and InN/Si ELECTRIC –DEVICESSimulation of AlGaN/Si and InN/Si ELECTRIC –DEVICES
Simulation of AlGaN/Si and InN/Si ELECTRIC –DEVICES
 
Impact of electronic correlation on the electron-phonon coupling
Impact of electronic correlation on the electron-phonon couplingImpact of electronic correlation on the electron-phonon coupling
Impact of electronic correlation on the electron-phonon coupling
 
Density Functional Theory
Density Functional TheoryDensity Functional Theory
Density Functional Theory
 
Dielectrics in a time-dependent electric field: density-polarization functi...
Dielectrics in a time-dependent electric field:   density-polarization functi...Dielectrics in a time-dependent electric field:   density-polarization functi...
Dielectrics in a time-dependent electric field: density-polarization functi...
 
10.637 Lecture 1: Introduction
10.637 Lecture 1: Introduction10.637 Lecture 1: Introduction
10.637 Lecture 1: Introduction
 

Similar to NANO281 Lecture 01 - Introduction to Data Science in Materials Science

Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...
Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...
Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...IRJET Journal
 
Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...
Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...
Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...Sérgio Sacani
 
An R-matrix approach for plasma modelling and the interpretation of astrophys...
An R-matrix approach for plasma modelling and the interpretation of astrophys...An R-matrix approach for plasma modelling and the interpretation of astrophys...
An R-matrix approach for plasma modelling and the interpretation of astrophys...AstroAtom
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABElisabeth Ortega
 
Characterization of the Scattering Properties of a Spherical Silver Nanoparti...
Characterization of the Scattering Properties of a Spherical Silver Nanoparti...Characterization of the Scattering Properties of a Spherical Silver Nanoparti...
Characterization of the Scattering Properties of a Spherical Silver Nanoparti...AI Publications
 
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Anubhav Jain
 
BigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyBigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyvincentlaulagnet
 
BigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyBigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyvincentlaulagnet
 
A Deterministic Heterogeneous Clustering Algorithm
A Deterministic Heterogeneous Clustering AlgorithmA Deterministic Heterogeneous Clustering Algorithm
A Deterministic Heterogeneous Clustering Algorithmiosrjce
 
Overview combining ab initio with continuum theory
Overview combining ab initio with continuum theoryOverview combining ab initio with continuum theory
Overview combining ab initio with continuum theoryDierk Raabe
 
ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...
ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...
ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...IAEME Publication
 
Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...
Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...
Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...IRJET Journal
 
Removal of polluting gasses from the exhaust of combustion engines using mon ...
Removal of polluting gasses from the exhaust of combustion engines using mon ...Removal of polluting gasses from the exhaust of combustion engines using mon ...
Removal of polluting gasses from the exhaust of combustion engines using mon ...Darren Magee
 
A quantitative numerical model of multilayer vapor deposited organic light em...
A quantitative numerical model of multilayer vapor deposited organic light em...A quantitative numerical model of multilayer vapor deposited organic light em...
A quantitative numerical model of multilayer vapor deposited organic light em...AjayaKumar Kavala
 
Fault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksFault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksIJECEIAES
 

Similar to NANO281 Lecture 01 - Introduction to Data Science in Materials Science (20)

Cnt Tm
Cnt TmCnt Tm
Cnt Tm
 
Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...
Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...
Theoretical Investigation on CuxV2-xO5 where x=0, 0.5 Using Density Functiona...
 
Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...
Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...
Direct detection of a break in the teraelectronvolt cosmic-ray spectrum of el...
 
An R-matrix approach for plasma modelling and the interpretation of astrophys...
An R-matrix approach for plasma modelling and the interpretation of astrophys...An R-matrix approach for plasma modelling and the interpretation of astrophys...
An R-matrix approach for plasma modelling and the interpretation of astrophys...
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UAB
 
01 05 j_chem_phys_123_074102
01 05 j_chem_phys_123_07410201 05 j_chem_phys_123_074102
01 05 j_chem_phys_123_074102
 
Characterization of the Scattering Properties of a Spherical Silver Nanoparti...
Characterization of the Scattering Properties of a Spherical Silver Nanoparti...Characterization of the Scattering Properties of a Spherical Silver Nanoparti...
Characterization of the Scattering Properties of a Spherical Silver Nanoparti...
 
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
 
BigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyBigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudy
 
BigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudyBigData_MultiDimensional_CaseStudy
BigData_MultiDimensional_CaseStudy
 
A Deterministic Heterogeneous Clustering Algorithm
A Deterministic Heterogeneous Clustering AlgorithmA Deterministic Heterogeneous Clustering Algorithm
A Deterministic Heterogeneous Clustering Algorithm
 
B017340511
B017340511B017340511
B017340511
 
Overview combining ab initio with continuum theory
Overview combining ab initio with continuum theoryOverview combining ab initio with continuum theory
Overview combining ab initio with continuum theory
 
ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...
ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...
ELECTRICAL PROPERTIES OF NI0.4MG0.6FE2O4 SYNTHESIZED BY CONVENTIONAL SOLID-ST...
 
Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...
Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...
Study of Geometrical, Electronic Structure, Spectral and NLO Properties Of Al...
 
Removal of polluting gasses from the exhaust of combustion engines using mon ...
Removal of polluting gasses from the exhaust of combustion engines using mon ...Removal of polluting gasses from the exhaust of combustion engines using mon ...
Removal of polluting gasses from the exhaust of combustion engines using mon ...
 
A quantitative numerical model of multilayer vapor deposited organic light em...
A quantitative numerical model of multilayer vapor deposited organic light em...A quantitative numerical model of multilayer vapor deposited organic light em...
A quantitative numerical model of multilayer vapor deposited organic light em...
 
Fault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networksFault detection in power transformers using random neural networks
Fault detection in power transformers using random neural networks
 
paper
paperpaper
paper
 
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural NetworkModelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
 

More from University of California, San Diego

UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal StructuresUCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal StructuresUniversity of California, San Diego
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
UCSD NANO106 - 08 - Principal Directions and Representation Quadrics
UCSD NANO106 - 08 - Principal Directions and Representation QuadricsUCSD NANO106 - 08 - Principal Directions and Representation Quadrics
UCSD NANO106 - 08 - Principal Directions and Representation QuadricsUniversity of California, San Diego
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...University of California, San Diego
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...University of California, San Diego
 

More from University of California, San Diego (20)

UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal StructuresUCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
UCSD NANO106 - 13 - Other Diffraction Techniques and Common Crystal Structures
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
NANO266 - Lecture 11 - Surfaces and Interfaces
NANO266 - Lecture 11 - Surfaces and InterfacesNANO266 - Lecture 11 - Surfaces and Interfaces
NANO266 - Lecture 11 - Surfaces and Interfaces
 
UCSD NANO106 - 12 - X-ray diffraction
UCSD NANO106 - 12 - X-ray diffractionUCSD NANO106 - 12 - X-ray diffraction
UCSD NANO106 - 12 - X-ray diffraction
 
UCSD NANO106 - 11 - X-rays and their interaction with matter
UCSD NANO106 - 11 - X-rays and their interaction with matterUCSD NANO106 - 11 - X-rays and their interaction with matter
UCSD NANO106 - 11 - X-rays and their interaction with matter
 
UCSD NANO106 - 10 - Bonding in Materials
UCSD NANO106 - 10 - Bonding in MaterialsUCSD NANO106 - 10 - Bonding in Materials
UCSD NANO106 - 10 - Bonding in Materials
 
UCSD NANO106 - 09 - Piezoelectricity and Elasticity
UCSD NANO106 - 09 - Piezoelectricity and ElasticityUCSD NANO106 - 09 - Piezoelectricity and Elasticity
UCSD NANO106 - 09 - Piezoelectricity and Elasticity
 
UCSD NANO106 - 08 - Principal Directions and Representation Quadrics
UCSD NANO106 - 08 - Principal Directions and Representation QuadricsUCSD NANO106 - 08 - Principal Directions and Representation Quadrics
UCSD NANO106 - 08 - Principal Directions and Representation Quadrics
 
UCSD NANO106 - 07 - Material properties and tensors
UCSD NANO106 - 07 - Material properties and tensorsUCSD NANO106 - 07 - Material properties and tensors
UCSD NANO106 - 07 - Material properties and tensors
 
NANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling TradeNANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling Trade
 
NANO266 - Lecture 8 - Properties of Periodic Solids
NANO266 - Lecture 8 - Properties of Periodic SolidsNANO266 - Lecture 8 - Properties of Periodic Solids
NANO266 - Lecture 8 - Properties of Periodic Solids
 
NANO266 - Lecture 7 - QM Modeling of Periodic Structures
NANO266 - Lecture 7 - QM Modeling of Periodic StructuresNANO266 - Lecture 7 - QM Modeling of Periodic Structures
NANO266 - Lecture 7 - QM Modeling of Periodic Structures
 
UCSD NANO106 - 06 - Plane and Space Groups
UCSD NANO106 - 06 - Plane and Space GroupsUCSD NANO106 - 06 - Plane and Space Groups
UCSD NANO106 - 06 - Plane and Space Groups
 
UCSD NANO106 - 05 - Group Symmetry and the 32 Point Groups
UCSD NANO106 - 05 - Group Symmetry and the 32 Point GroupsUCSD NANO106 - 05 - Group Symmetry and the 32 Point Groups
UCSD NANO106 - 05 - Group Symmetry and the 32 Point Groups
 
UCSD NANO106 - 04 - Symmetry in Crystallography
UCSD NANO106 - 04 - Symmetry in CrystallographyUCSD NANO106 - 04 - Symmetry in Crystallography
UCSD NANO106 - 04 - Symmetry in Crystallography
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
 
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice ComputationsUCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
UCSD NANO106 - 02 - 3D Bravis Lattices and Lattice Computations
 
UCSD NANO106 - 01 - Introduction to Crystallography
UCSD NANO106 - 01 - Introduction to CrystallographyUCSD NANO106 - 01 - Introduction to Crystallography
UCSD NANO106 - 01 - Introduction to Crystallography
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
NANO266 - Lecture 2 - The Hartree-Fock Approach
NANO266 - Lecture 2 - The Hartree-Fock ApproachNANO266 - Lecture 2 - The Hartree-Fock Approach
NANO266 - Lecture 2 - The Hartree-Fock Approach
 

Recently uploaded

Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...DrVipulVKapoor
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Farrington HS Streamlines Guest Entrance
Farrington HS Streamlines Guest EntranceFarrington HS Streamlines Guest Entrance
Farrington HS Streamlines Guest Entrancejulius27264
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptxUmeshTimilsina1
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxinfo924062
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEMISSRITIMABIOLOGYEXP
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipKarl Donert
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
Vinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media ComponentVinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
Geoffrey Chaucer Works II UGC NET JRF TGT PGT MA PHD Entrance Exam II History...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Farrington HS Streamlines Guest Entrance
Farrington HS Streamlines Guest EntranceFarrington HS Streamlines Guest Entrance
Farrington HS Streamlines Guest Entrance
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
 
Teaching Critical AI Literacies - Maha Bali
Teaching Critical AI Literacies - Maha BaliTeaching Critical AI Literacies - Maha Bali
Teaching Critical AI Literacies - Maha Bali
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFEPART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
PART 1 - CHAPTER 1 - CELL THE FUNDAMENTAL UNIT OF LIFE
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenship
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
Vinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media ComponentVinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media Component
 

NANO281 Lecture 01 - Introduction to Data Science in Materials Science

  • 1. Introduction to Data Science in Materials Science Shyue Ping Ong
  • 2. What is Data Science? Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. NANO281
  • 4. We are now living in the Data age… NANO281
  • 5. Materials data is growing … (stats as of Jan 1 2020) NANO281 ~ 200,000 crystals ~ 400,000 crystals Cambridge structural database (small-molecule organic and metal- organic crystal structures) since 1972… Source: https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/ http://cdn.rcsb.org/rcsb-pdb/v2/about-us/rcsb-pdb-impact.pdf Protein Data Bank (PDB)
  • 6. But quantity and quality lags many other fields…. NANO281 https://supercon.nims.go.jp/ ~1000+ superconductors (many minor composition modifications) One of the most comprehensive handbooks on materials data: • Density, thermal and electrical conductivity, melting and boiling points, etc. • But O(100) binaries and limited ternaries
  • 7. “First Principles” Materials Design Eψ(r) = − h 2 2m ∇2 ψ(r)+V(r)ψ(r) Schrodinger Equation 0 0.2 0.4 0.6 0.8 1 0 50 100 150 200 250 Diffusion coordinate Energy(meV) LCO NCO Material Properties Phase stability1 Diffusion barriers2 Charge densities6 Surface energies and Wulff shape3 Density functional theory (DFT) approximation Generally applicable to any chemistry 1 Ong et al., Chem. Mater., 2008, 20, 1798–1807. 2 Ong et al., Energy Environ. Sci., 2011, 4, 3680–3688. 3 Tran et al., Sci. Data, 2016, 3, 160080. 4 Deng et al., J. Electrochem. Soc., 2016, 163, A67–A74. 5 Wang et al., Chem. Mater., 2016, 28, 4024–4031. 6 Ong et al., Phys. Rev. B, 2012, 85, 2–5. Mechanical properties4 Electronic structure5 Inherently scalable NANO281
  • 8. Electronic structure calculations are today reliable and reasonably accurate. tials in Quantum ESPRESSO). In this case, too, the small D values indicate a good agreement between codes. This agreementmoreoverencom- passes varying degrees of numerical convergence, differences in the numerical implementation of the particular potentials, and computational dif- ferences beyond the pseudization scheme, most of which are expected to be of the same order of magnitude or smaller than the differences among all-electron codes (1 meV per atom at most). Conclusions and outlook Solid-state DFT codes have evolved considerably. The change from small and personalized codes to widespread general-purpose packages has pushed developers to aim for the best possible precision. Whereas past DFT-PBE literature on the lattice parameter of silicon indicated a spread of 0.05 Å, the most recent versions of the implementations discussed here agree on this value within 0.01 Å (Fig. 1 and tables S3 to S42). By comparing codes on a more detailed level using the D gauge, we have found the most recent methods to yield nearly indistinguishable EOS, with the associ- ated error bar comparable to that between dif- ferent high-precision experiments. This underpins thevalidityof recentDFTEOSresults andconfirms that correctly converged calculations yield reliable predictions. The implications are moreover rele- vant throughout the multidisciplinary set of fields that build upon DFT results, ranging from the physical to the biological sciences. In spite of the absence of one absolute refer- ence code, we were able to improve and demon- strate the reproducibility of DFT results by means of a pairwise comparison of a wide range of codes and methods. It is now possible to verify whether any newly developed methodology can reach the same precision described here, and new DFT applications can be shown to have used a meth- od and/or potentials that were screened in this way. The data generated in this study serve as a crucial enabler for such a reproducibility-driven paradigm shift, and future updates of available D values will be presented at http://molmod. ugent.be/deltacodesdft. The reproducibility of reported results also provides a sound basis for further improvement to the accuracy of DFT, particularly in the investigation of new DFT func- tionals, or for the development of new computa- tional approaches. This work might therefore Fig. 4. D values for comparisons between the most important DFT methods considered (in millielectron volts per atom). Shown are comparisons of all-electron (AE), PAW, ultrasoft (USPP), and norm-conserving pseudopotential (NCPP) results with all-electron results (methods are listed in alpha- betical order in each category). The labels for each method stand for code, code/specification (AE), or potential set/code (PAW, USPP, and NCPP) and are explained in full in tables S3 to S42.The color coding RESEARCH | RESEARCH ARTICLE onFebruary19,2017http://science.sciencemag.org/Downloadedfrom Lejaeghere et al. Science, 2016, 351 (6280), aad3000. Nitrides are an important class of optoel ported synthesizability of highly metasta nitrogen precursors (36, 37) suggests th spectrum of promising and technologica trides awaiting discovery. Although our study focuses on the m crystals, polymorphism and metastability is of great technological relevance to pha tronics, and protein folding (7). Our obs energy to metastability could address a d in organic molecular solids: Why do man numerous polymorphs within a small (~ whereas inorganic solids often see >100°C morph transition temperatures? The wea molecular solids yield cohesive energies o or −1 eV per molecule, about a third of t class of inorganic solids (iodides; Fig. 2B). yields a correspondingly small energy scal (38). When this small energy scale of orga is coupled with the rich structural diversity a tional degrees of freedom during molecular leads to a wide range of accessible polymorp modynamic conditions. Influence of composition The space of metastable compounds hov scape of equilibrium phases. As chemica thermodynamic system, the complexity grows. Figure 2A shows an example ca for the ternary Fe-Al-O system, plotted a tion energies referenced to the elemental S1.2 for discussion). We anticipate the th of a phase to be different when it is compe S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E HAUTIER, ONG, JAIN, MOORE, AND CEDER PHYSICAL REVIEW B 85, 155208 (2012) or meV/atom); 10 meV/atom corresponds to about 1 kJ/mol- atom. III. RESULTS Figure 2 plots the experimental reaction energies as a function of the computed reaction energies. All reactions involve binary oxides to ternary oxides and have been chosen as presented in Sec. II. The error bars indicate the experimental error on the reaction energy. The data points follow roughly the diagonal and no computed reaction energy deviates from the experimental data by more than 150 meV/atom. Figure 2 does not show any systematic increase in the DFT error with larger reaction energies. This justifies our focus in this study on absolute and not relative errors. In Fig. 3, we plot a histogram of the difference between the DFT and experimental reaction energies. GGA + U un- derestimates and overestimates the energy of reaction with the same frequency, and the mean difference between computed and experimental energies is 9.6 meV/atom. The root-mean- square (rms) deviation of the computed energies with respect to experiments is 34.4 meV/atom. Both the mean and rms are very different from the results obtained by Lany on reaction energies from the elements.52 Using pure GGA, Lany found that elemental formation energies are underestimated by GGA with a much larger rms of 240 meV/atom. Our results are closer to experiments because of the greater accuracy of DFT when comparing chemically similar compounds such as binary and ternary oxides due to errors cancellation.40 We should note that even using elemental energies that are fitted to minimize the error versus experiment in a large set of reactions, Lany reports that the error is still 70 meV/atom and much larger than what we find for the relevant reaction energies. The rms we found is consistent with the error of 3 kJ/mol-atom 600 800 l V/at) FIG. 3. (Color online) Histogram of the difference between computed ( E comp 0 K ) and experimental ( E expt 0 K ) energies of reaction (in meV/atom). (30 meV/atom) for reaction energies from the binaries in the limited set of perovskites reported by Martinez et al.29 Very often, instead of the exact reaction energy, one is interested in knowing if a ternary compound is stable enough to form with respect to the binaries. This is typically the case when a new ternary oxide phase is proposed and tested for stability versus the competing binary phases.18 From the 131 compounds for which reaction energies are negative according to experiments, all but two (Al2SiO5 and CeAlO3) are also negative according to computations. This success in predicting stability versus binary oxides of known ternary oxides can be related to the very large magnitude of reaction energies from binary to ternary oxides compared to the typical errors observed (rms of 34 meV/atom). Indeed, for the vast majority of the reactions (109 among 131), the experimental reaction en- ergies are larger than 50 meV/atom. It is unlikely then that the DFT error would be large enough to offset this large reaction energy and make a stable compound unstable versus the binary oxides. The histogram in Fig. 3 shows several reaction energies with significant errors. Failures and successes of DFT are often JSON document in the format of a Crystallographic Information File (cif), which can also be downloaded via the Materials Project website and Crystalium web application. In addition, the weighted surface energy (equation (2)), shape factor (equation (3)), and surface anisotropy (equation (4)) are given. Table 2 provides a full description of all properties available in each entry as well as their corresponding JSON key. Technical Validation The data was validated through an extensive comparison with surface energies from experiments and other DFT studies in the literature. Due to limitations in the available literature, only the data on ground state phases were compared. Comparison to experimental measurements Experimental determination of surface energy typically involves measuring the liquid surface tension and solid-liquid interfacial energy of the material20 to estimate the solid surface energy at the melting temperature, which is then extrapolated to 0 K under isotropic approximations. Surface energies for individual crystal facets are rarely available experimentally. Figure 5 compares the weighted surface energies of all crystals (equation (2)) to experimental values in the literature20,23,26–28 . It should be noted that we have adopted the latest experimental values available for comparison, i.e., values were obtained from the 2016 review by Mills et al.27 , followed by Keene28 , and finally Niessen et al.26 and Miller and Tyson20 . A one-factor linear regression line γDFT ¼ γEXP þ c was fitted for the data points. The choice of the one factor fit is motivated by the fact that standard broken bond models show that there is a direct relationship between surface energies and cohesive energies, and previous studies have found no evidence that DFT errors in the cohesive energy scale with the magnitude of the cohesive energy itself61 . We find that the DFT weighted surface energies are in excellent agreement with experimental values, with an average underestimation of only 0.01 J m− 2 and a standard error of the estimate (SEE) of 0.27 J m− 2 . The Pearson correlation coefficient r is 0.966. Crystals with surfaces that are well-known to undergo significant reconstruction tend to have errors in weighted surface energies that are larger than the SEE. The differences between the calculated and experimental surface energies can be attributed to three main factors. First, there are uncertainties in the experimental surface energies. The experimental values derived by Miller and Tyson20 are extrapolations from extreme temperatures beyond the melting point. The surface energy of Ge, Si62 , Te63 , and Se64 were determined at 77, 77, 432 and 313 K respectively while Figure 5. Comparison to experimental surface energies. Plot of experimental versus calculated weighted surface energies for ground-state elemental crystals. Structures known to reconstruct have blue data points while square data points correspond to non-metals. Points that are within the standard error of the estimate − 2 Phase stability Formation energies Tran, et al. Sci. Data 2016, 3, 160080. Sun, et al. Sci. Adv. 2016, 2 (11), e1600225. Figure 2. Distribution of calculated volume per atom, Poisson ratio, bulk modulus and shear modulus. Vector field-plot showing the distribution of the bulk and shear modulus, Poisson ratio and atomic volume for 1,181 metals, compounds and non-metals. Arrows pointing at 12 o’clock correspond to minimum volume-per-atom and move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 o’clock. Bar plots indicate the distribution of materials in terms of their shear and bulk moduli. www.nature.com/sdata/ Surface energies Elastic constants de Jong et al. Sci. Data 2015, 2, 150009. Hautier et al. Phys. Rev. B 2012, 85, 155208. NANO281 Modern electronic structure codes give relatively consistent equations of state.
  • 9. Software frameworks for HT electronic structure computations Atomic Simulation Environment https://wiki.fysik.dtu.dk/ase Materials Project1 https://www.materialsproject.org Custodianhttp://aflowlib.org http://www.aiida.net 1 Jain et al. APL Mater. 2013, 1 (1), 11002. 2 Ong et al. Comput. Mater. Sci. 2013, 68, 314–319. 3 Jain et al. Concurr. Comput. Pract. Exp. 2015, 27 (17), 5037– 5059. 2 3 NANO281
  • 10. Computation +Automation -> Large databases Jain, et al. , APL Mater., 2013, 1, 11002. NANO281
  • 11. The Materials Project is an open science project to make the computed properties of all known inorganic materials publicly available to all researchers to accelerate materials innovation. June 2011: Materials Genome Initiative which aims to “fund computational tools, software, new methods for material characterization, and the development of open standards and databases that will make the process of discovery and development of advanced materials faster, less expensive, and more predictable” https://www.materialsproject.org NANO281
  • 12. NANO281 “Google” of Materials 1 Jain et al. APL Mater. 2013, 1 (1), 11002. . Structure Electronic Structure Elastic properties XRD Energetic properties
  • 13. Materials Project DB How do I access MP data? MaterialsAPI Pros • Intuitive and user-friendly • Secure WebApps RESTfulAPI • Programmatic access for developers and researchers NANO281
  • 14. The Materials API An open platform for accessing Materials Project data based on REpresentational State Transfer (REST) principles. Flexible and scalable to cater to large number of users, with different access privileges. Simple to use and code agnostic. NANO281
  • 15. A REST API maps a URL to a resource. Example: GET https://api.dropbox.com/1/account/info Returns information about a user’s account. Methods: GET, POST, PUT, DELETE, etc. Response: Usually JSON or XML or both NANO281
  • 18. https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energy Preamble Identifier, typically a formula (Fe2O3), id (1234) or chemical system (Li-Fe-O) Data type (vasp, exp, etc.) Property Request type NANO281
  • 19. Secure access An individual API key provides secure access with defined privileges. All https requests must supply API key as either a “x-api-key” header or a GET/POST “API_KEY” parameter. API key available at https://www.materialsproject.org/dashbo ard NANO281
  • 20. Sample output (JSON) Intuitive response format Machine-readable (JSON parsers available for most programming languages) Metadata provides provenance for tracking crea ed_a : "2014-07-18T11:23:25.415382", alid_response: r e, ersion: , - p ma gen: "2.9.9", db: "2014.04.18", res : "1.0" response: [ ], - , - energ : -67.16532048, ma erial_id: "mp-24972" , - energ : -132.33035197, ma erial_id: "mp-542309" ,+ ,+ ,+ ,+ ,+ ,+ ,+ + cop righ : "Ma erial Projec , 2012" NANO281
  • 21. Demo of Materials Data Sources NANO281 https://docs.google.com/spreadsheets/d/18MPVaixzX7hQN6lT0n9- FdmTnTYjQDkvRhI9T5Ym1jQ/edit?usp=sharing
  • 22. Types of Materials Data Qualitative data Nominal measurement (categories) E.g., Metal/Insulator, Stable/Unstable No rank or order Ranked data Ordinal measurement (ordered) E.g., Insulator/ semiconductor/ conductor Does not indicate distance between ranks Quantitative Data Interval/ratio measurement (equal intervals and true 0) E.g., melting point, elastic constant, electrical/ionic conductivity Considerable information and permits meaningful arithmetic operations NANO281
  • 23. Machine learning (ML) is nothing more than (highly) sophisticated curve fitting…. NANO281 Image: https://www.slideshare.net/awahid/big-data-and-machine-learning-for-businesses and Google Images
  • 24. Typical Materials Data ScienceWorkflow Identify Purpose and Target Data Collection Featurization Training Application Active learning Domain knowledge - Is target learnable? - Is target ambiguous? Data Sources Existing DIY Elemental Features Structural Features Classification Decision tree Logistic regression ... Regression GPR KRR Multi-linear Random forest SVR Neural networks Graph models ... Supervised - Cross-validation - Hyper-parameter optimization Tools ænet Automatminer CGCNN DeepChem MEGNet PROPhet SchnetPack TensorMol ... NANO281
  • 25. Where is ML valuable in Materials Science? NANO281 Things that are too slow/difficult to compute Relationships that are beyond our understanding (at the moment) (AA’)0.5(BB’)0.5O3 perovskite 10 A and 10 B species = (10C2 x 8C4)2 ≈107 Element-wise classification model Prediction PredictedInput CN4 - Motif 1 CN5 - Mo CN6 CN4 1: single bond 2: L-shaped 2: water-like 2: bent 120 degrees 2: bent 150 degrees 2: linear 3: T-shaped 3: trigonal planar 3: trigonal non- coplanar 4: square co- planar 4: tetrahedral 4: rectangular see-saw 4: see-saw like 4: trigonal pyramidal 5: pentagonal planar 5: square pyramidal 5: trigonal bipyramidal 6: hexagonal planar 6: octahedral 6: pentagonal pyramidal 7: hexagonal pyramidal 7: pentagonal bipyramidal 8: body-centered cubic 8: hexagonal bipyramidal 12: cuboctahedra ??
  • 26. Data History of the Materials Project Reasonable ML Deep learning (AA’)0.5(BB’)0.5O3 perovskite 2 x 2 x 2 supercell, 10 A and 10 B species = (10C2 x 8C4)2 ≈107 NANO281 ratio of (634 + 34)/485 ≈ 1.38 (Supplementary Table S-II) with b5% difference in the experimental and theoretical values. This again agree well with those calculated from the rule of mixture (Supplemen- tary Table-III). The experimental XRD patterns also agree well with Fig. 2. Atomic-resolution STEM ABF and HAADF images of a representative high-entropy perovskite oxide, Sr(Zr0.2Sn0.2Ti0.2Hf0.2Mn0.2)O3. (a, c) ABF and (b, d) HAADF images at (a, b) low and (c, d) high magnifications showing nanoscale compositional homogeneity and atomic structure. The [001] zone axis and two perpendicular atomic planes (110) and (110) are marked. Insets are averaged STEM images. Jiang et al. A New Class of High-Entropy Perovskite Oxides. Scripta Materialia 2018, 142, 116–120. Materials design is combinatorial
  • 27. Solution:Surrogate models for“instant” property predictions NANO281 “descriptors/features”“target" Property • Energies (formation, Ehull, reaction, binding, etc.) • Band gaps • Mechanical properties • Functional properties (e.g., ionic conductivity) • …. Composition • Stoichiometric attributes, e.g., number and ratio of elements, etc. • Elemental property, e.g., mean, range, min, max, etc. of elemental properties such as atomic number, electronegativity, row, group, atomic radii, etc. • Electronic structure, e.g., number of valence electrons, shells, etc. • … Structure • Crystal/molecular symmetry • Lattice parameters • Atomic coordinates • Connectivity / bonding between atoms • … = f( ),
  • 28. Composition-based models NANO281 Zheng, X., et al (2018). Chem. Sci., 9(44), 8426-8432. Jha et al. (2018) Sci. Rep., 8(1), 17593. Meredig et al. (2014) Phys. Rev. B 89, 094104 Feature engineering Deep Learning
  • 29. Structure-based models NANO281 Property-labelled materials fragments + gradient boosting decision tree Isayev et al. (2017) Nature Comm., 8, 15679 Xie et al. (2018) Phys. Rev. Lett. 120, 145301 Crystal graph + graph convolutional neural networks Smooth overlap of atom positions (SOAP) Rosenbrock et al. npj Comput. Mater. (2017), 3, 29
  • 30. State of the art:Graph-based representations Figure 4: Pearson correlations between elemental embedding vectors. Elements are arranged in order of increasing Mendeleev number49 for easier visualization of trends. Nissan Motor Co.NANO281
  • 31. Performance on 130,462 QM9 molecules NANO281 80%-10%-10% train-validation-test split Only Z as atomic feature, i.e., feature selection helps model learn, but is not critical! MEGNET1 MEGNET- Simple1 SchNet2 “Chemical Accuracy” U0 (meV) 9 12 14 43 G (meV) 10 12 14 43 εHOMO (eV) 0.038 0.043 0.041 0.043 εLUMO (eV) 0.031 0.044 0.034 0.043 Cv (cal/molK) 0.030 0.029 0.033 0.05 1 Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294. 2 Schutt et al. J. Chem. Phys. 148, 241722 (2018) State-of-the-art performance surpassing chemical accuracy in 11 of 13 properties!
  • 32. Performance on Materials Project Crystals NANO281 Property MEGNet SchNet1 CGCNN2 Formation energy Ef (meV/atom) 28 (60,000) 35 39 (28,046) Band gap Eg (eV) 0.330 (36,720) - 0.388 (16,485) log10 KVRH (GPa) 0.050 (4,664) - 0.054 (2,041) log10 GVRH (GPa) 0.079 (4,664) - 0.087 (2,041) Metal classifier 78.9% (55,391) - 80% (28,046) Non-metal classifier 90.6% (55,391) - 95% (28,046) 1 Schutt et al. J. Chem. Phys. 148, 241722 (2018) 2 Xie et al. PRL. 120.14 (2018): 145301.
  • 33. The Scale Challenge in Computational Materials Science Many real-world materials problems are not related to bulk crystals. Huang et al. ACS Energy Lett. 2018, 3 (12), 2983– 2988. Tang et al. Chem. Mater. 2018, 30 (1), 163–173. Electrode-electrolyte interfaces Catalysis Microstructure and segregation Need linear-scaling with ab initio accuracy. NANO281
  • 34. Machine Learning: A solution to the Scale Challenge in Computational Materials Science? Length Scale AccuracyTransferability Finite element / continuum models Empirical potentials First principles methods Critical challenge Bridging the 10-10 → 10-6 m or 10-12 → 10-6 sec scales in a manner that retains transferability and accuracy, and is scalable. Time Scale Atomic vibrations<ps ns µs Ion dynamics Reaction dynamics ms NANO281
  • 35. Machine learning the potential energy surface NANO281 symmetry functions (ACSF)39 to represent the atomic local environments and fully con- nected neural networks to describe the PES with respect to symmetry functions.11,12 A separate neural network is used for each atom. The neural network is defined by the number of hidden layers and the nodes in each layer, while the descriptor space is given by the following symmetry functions: Gatom,rad i = NatomX j6=i e ⌘(Rij Rs)2 · fc(Rij), Gatom,ang i = 21 ⇣ NatomX j,k6=i (1 + cos ✓ijk)⇣ · e ⌘0(R2 ij+R2 ik+R2 jk) · fc(Rij) · fc(Rik) · fc(Rjk), where Rij is the distance between atom i and neighbor atom j, ⌘ is the width of the Gaussian and Rs is the position shift over all neighboring atoms within the cuto↵ radius Rc, ⌘0 is the width of the Gaussian basis and ⇣ controls the angular resolution. fc(Rij) is a cuto↵ function, defined as follows: fc(Rij) = 8 >>< >>: 0.5 · [cos ( ⇡Rij Rc ) + 1], for Rij  Rc 0.0, for Rij > Rc. These hyperparameters were optimized to minimize the mean absolute errors of en- ergies and forces for each chemistry. The NNP model has shown great performance for Si,11 TiO2,40 water41 and solid-liquid interfaces,42 metal-organic frameworks,43 and has been extended to incorporate long-range electrostatics for ionic systems such as 4 nO44 and Li3PO4.45 aussian Approximation Potential (GAP). The GAP calculates the similar- y between atomic configurations based on a smooth-overlap of atomic positions OAP)10,46 kernel, which is then used in a Gaussian process model. In SOAP, the aussian-smeared atomic neighbor densities ⇢i(R) are expanded in spherical harmonics follows: ⇢i(R) = X j fc(Rij) · exp( |R Rij|2 2 2 atom ) = X nlm cnlm gn(R)Ylm( ˆR), he spherical power spectrum vector, which is in turn the square of expansion coe - ents, pn1n2l(Ri) = lX m= l c⇤ n1lmcn2lm, n be used to construct the SOAP kernel while raised to a positive integer power ⇣ hich is 4 in present case) to accentuate the sensitivity of the kernel,10 K(R, R0 ) = X n1n2l (pn1n2l(R)pn1n2l(R0 ))⇣ , the above equations, atom is a smoothness controlling the Gaussian smearing, and Distances and angles Neighbor density Linear regression Kernel regression Neural networks Neural Network Potential (NNP)1 Moment Tensor Potential (MTP)2 Gaussian Approximation Potential (GAP)3 Spectral Neighbor Analysis Potential (SNAP)4 ML models Descriptors 1 Behler et al. PRL. 98.14 (2007): 146401. 2 Shapeev MultiScale Modeling and Simulation 14, (2016). 3 Bart ́ok et al. PRL. 104.13 (2010): 136403. 4 Thompson et al. J. Chem. Phys. 285, 316330 (2015)
  • 36. Standardized workflow for ML-IAP construction and evaluation Pymatgen Fireworks + VASP DFT static Dataset Elastic deformation Distorted structures Surface generation Surface structures Vacancy + AIMD Trajectory snapshots (low T, high T) AIMD Trajectory snapshots Crystal structure property fittingE e e.g. elastic, phonon ··· energy weights degrees of freedom ··· cutoff radius expansion width S1 S2 Sn · · · rc atomic descriptors local environment sites · · · · · · X1(r1j … r1n) X2(r2k … r2m) Xn(rnj … rnm) machine learning Y =f(X; !) Y (energy, force, stress) DFT properties grid search evolutionary algorithm NANO281 Available open source on Github: https://github.com/materialsvirtuallab/mlearn Zuo, Y.; Chen, C.; Li, X.; Deng, Z.; Chen, Y.; Behler, J.; Csányi, G.; Shapeev, A. V.; Thompson, A. P.; Wood, M. A.; et al. A Performance and Cost Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.
  • 37. Ni-Mo SNAP performance NANO281 q SNAP significantly outperforms in binary and bcc Mo for energy and elastic constants. Energy Forces Elasticconstants
  • 38. Ni-Mo phase diagram NANO281 EAM completely fails to reproduce Ni-Mo phase diagram Ni3Mo Ni4Mo Solid-liquid equilibrium
  • 39. Application:Investigating Hall-Petch strengthening in Ni-Mo NANO281 q ~20,000 to ~455,000 atoms q Uniaxially strained with a strain rate of 5×108 s- 1 q SNAP reproduces the Hall-Petch relationship, consistent with experiment[1].[1] Hu et al. Nature, 2017, 355, 1292
  • 40. ML-IAP:Accuracy vs Cost NANO281 Testerror(meV/atom) Computational cost s/(MD step atom) a b Jmax = 3 Jmax = 3 2000 kernels20 polynomial powers hidden layers [16, 16] Mo dataset Zuo et al. A Performance and Cost Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.
  • 41. Modeling relationships that are too complex to understand right now…. Oct 10 2019 What’s the equivalent of this problem in materials characterization? Identify Absorption Species Learner N Peak Shifting / Alignment Spectra Norm. (Optional) Feature Trans. Intensity Norm. Similarity Measure … Learner 1 Peak Shifting / Alignment Spectra Norm. (Optional) Feature Trans. Intensity Norm. Similarity Measure Rank 1 Rank N Combined Rank Prob. Each Spectrum Database Zheng et al. Automated generation and ensemble-learned matching of X-ray absorption spectra. npj Comput. Mater. 2018, 4 (1), 12 500,000 computed K-edge XANES of > 50,000 crystals (In progress: L edges and EXAFS) ~84% accuracy in identifying correct oxidation state and coordination environment!
  • 42. Random Forest Coordination Environment Classification Oct 10 2019 Alkali TM Post-TM Metalloid Carbon Alkaline
  • 43. Other examples NANO281 Oviedo et al. (2019) npj Comput. Mater. 5, 60. Classification of crystal structures from XRD Schütt et al. (2019) Nature Comm. 10, 5024 Prediction of wavefunctions