The document describes the Materials Project computation infrastructure, which uses the Atomate framework to automatically run density functional theory simulations on over 85,000 materials in a high-throughput manner, with the results stored in a MongoDB database for users to explore and analyze in order to accelerate materials innovation. The Materials Project infrastructure aims to make it easy for researchers to generate large amounts of computational data on materials properties through standardized and scalable workflows.
This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the Python Materials Genomics (pymatgen) materials analysis library. Pymatgen is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (http://www.materialsproject.org), an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features:
1. Highly flexible classes for the representation of Element, Site, Molecule, Structure objects.
Extensive io capabilities to manipulate many VASP (http://cms.mpi.univie.ac.at/vasp/) and ABINIT (http://www.abinit.org/) input and output files and the crystallographic information file format. This includes generating Structure objects from vasp input and output. There is also support for Gaussian input files and XYZ file for molecules.
2. Comprehensive tool to generate and view compositional and grand canonical phase diagrams.
3. Electronic structure analyses (DOS and Bandstructure).
4. Integration with the Materials Project REST API.
This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the Python Materials Genomics (pymatgen) materials analysis library. Pymatgen is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (http://www.materialsproject.org), an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features:
1. Highly flexible classes for the representation of Element, Site, Molecule, Structure objects.
Extensive io capabilities to manipulate many VASP (http://cms.mpi.univie.ac.at/vasp/) and ABINIT (http://www.abinit.org/) input and output files and the crystallographic information file format. This includes generating Structure objects from vasp input and output. There is also support for Gaussian input files and XYZ file for molecules.
2. Comprehensive tool to generate and view compositional and grand canonical phase diagrams.
3. Electronic structure analyses (DOS and Bandstructure).
4. Integration with the Materials Project REST API.
Development and quantification of interatomic potentials. Presented at HTCMC 9 in Toronto, Canada June 30th 2016. For further information on DFTFIT see https://github.com/costrouc/dftfit
This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the pymatgen-db database plugin for the pymatge) materials analysis library, and the custodian error recovery framework.
Pymatgen-db enables the creation of Materials Project-style MongoDB databases for management of materials data. A query engine is also provided to enable the easy translation of MongoDB docs to useful pymatgen objects for analysis purposes.
Custodian is a simple, robust and flexible just-in-time (JIT) job management framework written in Python. Using custodian, you can create wrappers that perform error checking, job management and error recovery. It has a simple plugin framework that allows you to develop specific job management workflows for different applications. Error recovery is an important aspect of many high-throughput projects that generate data on a large scale. The specific use case for custodian is for long running jobs, with potentially random errors. For example, there may be a script that takes several days to run on a server, with a 1% chance of some IO error causing the job to fail. Using custodian, one can develop a mechanism to gracefully recover from the error, and restart the job with modified parameters if necessary. The current version of Custodian also comes with sub-packages for error handling for Vienna Ab Initio Simulation Package (VASP) and QChem calculations.
Properties and applications of graphene.
More introductions about graphene are in Alfa Chemistry.
https://www.alfa-chemistry.com/products/graphene-38.htm
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
An introductory workshop about machine learning in chemistry. This workshop is a set of slides and jupyter notebooks intended to give an overview of machine learning in chemistry to graduate students in chemical sciences, which was originally presented during a research trip to Ben Gurion University and the Hebrew University in Jerusalem in February 2019. Part 1 of 2.
The workshop lives at https://github.com/jpjanet/ML-chem-workshop where it is maintained in an up-to-date fashion. Notebook examples can be obtained from the GitHub page.
Development and quantification of interatomic potentials. Presented at HTCMC 9 in Toronto, Canada June 30th 2016. For further information on DFTFIT see https://github.com/costrouc/dftfit
This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the pymatgen-db database plugin for the pymatge) materials analysis library, and the custodian error recovery framework.
Pymatgen-db enables the creation of Materials Project-style MongoDB databases for management of materials data. A query engine is also provided to enable the easy translation of MongoDB docs to useful pymatgen objects for analysis purposes.
Custodian is a simple, robust and flexible just-in-time (JIT) job management framework written in Python. Using custodian, you can create wrappers that perform error checking, job management and error recovery. It has a simple plugin framework that allows you to develop specific job management workflows for different applications. Error recovery is an important aspect of many high-throughput projects that generate data on a large scale. The specific use case for custodian is for long running jobs, with potentially random errors. For example, there may be a script that takes several days to run on a server, with a 1% chance of some IO error causing the job to fail. Using custodian, one can develop a mechanism to gracefully recover from the error, and restart the job with modified parameters if necessary. The current version of Custodian also comes with sub-packages for error handling for Vienna Ab Initio Simulation Package (VASP) and QChem calculations.
Properties and applications of graphene.
More introductions about graphene are in Alfa Chemistry.
https://www.alfa-chemistry.com/products/graphene-38.htm
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
An introductory workshop about machine learning in chemistry. This workshop is a set of slides and jupyter notebooks intended to give an overview of machine learning in chemistry to graduate students in chemical sciences, which was originally presented during a research trip to Ben Gurion University and the Hebrew University in Jerusalem in February 2019. Part 1 of 2.
The workshop lives at https://github.com/jpjanet/ML-chem-workshop where it is maintained in an up-to-date fashion. Notebook examples can be obtained from the GitHub page.
In this video from ChefConf 2014 in San Francisco, Cycle Computing CEO Jason Stowe outlines the biggest challenge facing us today, Climate Change, and suggests how Cloud HPC can help find a solution, including ideas around Climate Engineering, and Renewable Energy.
"As proof points, Jason uses three use cases from Cycle Computing customers, including from companies like HGST (a Western Digital Company), Aerospace Corporation, Novartis, and the University of Southern California. It’s clear that with these new tools that leverage both Cloud Computing, and HPC – the power of Cloud HPC enables researchers, and designers to ask the right questions, to help them find better answers, faster. This all delivers a more powerful future, and means to solving these really difficult problems."
Watch the video presentation: http://insidehpc.com/2014/09/video-hpc-cluster-computing-64-156000-cores/
Carles Bo, d'ICIQ, presenta IoChem-BD, un repositori de dades en química computacional. L'objectiu és elaborar una base de dades de forma normalitzada, definint processos, què es guarda i com es fa.
Aquesta presentació ha tingut lloc a la TSIUC'14, celebrada a la Universitat Autònoma de Barcelona el passat 2 de desembre de 2014, sota el títol "Reptes en Big Data a la universitat i la Recerca".
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Materials Project computation and database infrastructure
1. Materials Project computation and
database infrastructure
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Presentation given to Delaware Energy Institute, 2018
Slides (already) posted to https://hackingmaterials.lbl.gov
2. Outline
2
① Introduction to the Materials Project
② Materials Project computation infrastructure
③ Database considerations
3. The Materials Project database
• Online resource of density
functional theory simulation data
for ~85,000 inorganic materials
• Includes band structures, elastic
tensors, piezoelectric tensors,
battery properties and more
• >60,000 registered users
• Free
• www.materialsproject.org
3
Jain et al. Commentary: The Materials Project: A
materials genome approach to accelerating
materials innovation. APL Mater. 1, 11002 (2013).
4. 4
Many data sets are available!
M. De Jong et
al. Sci. Data,
2015, 2,
150009.
]
M. De Jong et
al. Sci. Data,
2015, 2,
150009.
6. Outline
6
① Introduction to the Materials Project
② Materials Project computation infrastructure
③ Database considerations
7. A “black-box” view of performing a calculation
7
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
8. Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
8
lots of tedious,
low-level work…
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
9. What would be a better way?
9
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
10. What would be a better way?
10
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run
q band structure
q surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
11. Ideally the method should scale to millions of calculations
11
Results!
researcher
Start with all binary
oxides, replace O->S,
run several different
properties
Workflows to run
ü band structure
ü surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
q spin-orbit coupling
12. Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
12
Results!
researcher
Run many different
properties of many
different materials!
13. Atomate contains a library of simulation procedures
13
VASP-based
• band structure
• spin-orbit coupling
• hybrid functional
calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• NEB
• GIBBS method
• QH thermal
expansion
• AIMD
• ferroelectric
• surface adsorption
• work functions
Other
• BoltzTraP
• FEFF method
• LAMMPS MD
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
14. Each simulation procedure translates high-level instructions
into a series of low-level tasks
14
quickly and automatically translate PI-style (minimal)
specifications into well-defined FireWorks workflows
What is the
GGA-PBE elastic
tensor of GaAs?
M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al.,
Charting the complete elastic properties of inorganic crystalline compounds,
Sci. Data. 2 (2015).
15. Atomate thus encodes and standardizes knowledge about
running various kinds of simulations from domain experts
15
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
16. 16
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + executeoutput files + database
17. 17
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + executeoutput files + database
18. • Pymatgen can retrieve crystal
structures from the Materials
Project database (MPRester class)
• It can also manipulate crystal
structures
– substitutions
– supercell creation
– order-disorder (shown at right)
– interstitial finding
– surface / slab generation
• A visual interface to many of the
tools are in Materials Project’s
“Crystal Toolkit” app
18
Crystal structure generation via pymatgen
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
19. 19
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + executeoutput files + database
20. 20
Atomate’s main goal – convert structures to workflows
Workflows consist of a series of jobs (“FireWorks”), each
with multiple tasks. Atomate jobs typically (i) run a
calculation and (ii) store the results in a database
21. 21
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + executeoutput files + database
22. FireWorks allows you to write your workflow once and
execute (almost) anywhere
22
• Execute workflows
locally or at a
supercomputing
center
• Queue systems
supported
– PBS
– SGE
– SLURM
– IBM LoadLeveler
– NEWT (a REST-based
API at NERSC)
– Cobalt (Argonne LCF)
24. • Job provenance and automatic metadata storage
• Detect and rerun failures
• “Dynamic” workflows that change behavior based on
results
• Customize job priorities
• Much more…
24
Other features
25. 25
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + executeoutput files + database
27. 27
The atomate database makes it easy to perform various
analyses with pymatgen
atomate output
database(s)
phase
diagrams
Pourbaix
diagrams
diffusivity via MDband structure analysis
28. 28
Many research groups have run tens of thousands of
materials science workflows with atomate
also used by:
• Persson research group, UC Berkeley
• Ong research group, UC San Diego
• Neaton research group, UC Berkeley
• Liu research group, Penn State
• Groups not developing on atomate!
• e.g., see “Thermal expansion of quaternary nitride coatings” by
Tasnadi et al.
atomate now powers the Materials
Project and will be used to run
hundreds of thousands of
simulations in the next year
(www.materialsproject.org)
29. Outline
29
① Introduction to the Materials Project
② Materials Project computation infrastructure
③ Database considerations
30. 30
About a decade ago, we were using a SQL infrastructure
Main problems we ran into:
• Too static – every time we wanted
to store a new kind of data, the DB
master needed to “design and
update” the database schema
• Too difficult for newcomers –
constructing queries (joins, etc.).
We actually designed a system to
help people make queries, which is
common
31. 31
Since then, we have switched to MongoDB –
a “noSQL” database
Major advantages
• Very dynamic – easy to add
new data types without
interfering with old data
types or redesigning
everything. No central
“database master” needed
• Easy for newcomers – easy
syntax, no complex “joins”,
easy to visualize results
• Easy object-relational
mapping – built our
pymatgen code so that any
objects (e.g., band
structures, crystal
structures, etc.) could be
exported to a database or
imported from a database
easily
32. 32
How we store computed data
Data is stored in “collections”. Each collection is a set of documents that can be queried.
Each document
consists of nested key-
value pairs
(“dictionaries”) or
arrays.
e.g. one can search for:
{“tags”: “phosphides”}
to retrieve all
documents tagged
with “phosphide”
33. 33
Each collection has a set of standard keys
Data is stored in “collections”. Each collection is a set of documents that can be queried.
materials collection – each
document represents a
material, with keys like
“formula” and “band_gap”
tasks collection – each
document represents a
DFT calculation, with keys
like “dir_name” and
“input.parameters”
workflows collection – each
document represents a
calculation workflow, with
keys like “nodes” and
“links”
Typically, each document within a collection will be of a uniform
format, but this not a hard requirement in MongoDB.
34. 1. As described previously: for each data type (a
“material”, “task”, “workflow”, etc.) decide on a
set of fields that describe each instance of that
data type. In MongoDB, these fields can easily
be changed or added to later if needed.
2. Try to create a single collection and document
format that can handle any kind of materials
data!
– example 1: “PIF” file format from Citrine[1]
– example 2: MPContribs from Materials Project[2]
34
Two approaches to store data in MongoDB
[1] J. O’Mara, B. Meredig, K. Michel, Materials Data
Infrastructure : A Case Study of the Citrination Platform to
Examine Data Import , Storage , and Access, Jom. (2016).
[2] P. Huck, D. Gunter, S. Cholia, D. Winston, A.T. N’Diaye, K. Persson, User
applications driven by the community contribution framework MPContribs
in the Materials Project, Concurr. Comput. Pract. Exp. 22 (2015)
37. Funding: DOE-BES Materials Science Division, Computing: NERSC
37
Who to talk to next!
The current “Guardians of the MP infrastructure”
Slides (already) posted to https://hackingmaterials.lbl.gov