This document discusses open source tools for materials informatics, including Matminer and Matscholar. Matminer is a library of descriptors for materials science data that can generate features for machine learning models. It includes over 60 featurizer classes and supports scikit-learn. Matscholar applies natural language processing to over 2 million materials science abstracts to extract keywords and enable improved literature searching. The document argues that open datasets like Matbench and automated tools like Automatminer could help lower barriers for developing machine learning models in materials science by making it easier to obtain training data and evaluate model performance.
Graphene: its increasing economic feasibility Jeffrey Funk
These slides use concepts from my (Jeff Funk) course entitled analyzing hi-tech opportunities to analyze how Graphene is becoming economic feasible for an increasing number of applications as its price falls and its quality/performance rises through improvements in chemical vapor deposition processes. Graphene is one of the strongest materials discovered, has high electronic and thermal conductivities, and unusual optical properties. These slides describe a number of applications for which Graphene is gradually becoming economically feasible including displays, integrated circuits, solar cells, water desalination, and natural gas tanks.
Prof Ong gave a webinar talk on the AI Revolution in Materials Science for the Singapore Agency of Science Technology and Research (A*STAR). In this talk, he discussed the big challenges in materials science where AI can potentially make a huge impact towards addressing as well as outstanding challenges and opportunities to bringing forth the AI revolution to the materials domain.
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...Akinola Oyedele
Perovskite-based PV have triggered widespread interest in the scientific community because these materials offer the attractive combinations of low cost and theoretically high efficiency. However, several challenges must be overcome for these relatively new PV materials. Among the many important challenges, one is the choice of materials to be used in thin film PV devices..
Based on fundamental principles of solar photovoltaics, this problem focuses on two aspects of the perovskite system:
1) Based on a planar p-i-n device structure, a potential list of p- and n-type charge collecting layers as well as the conductive contacts that could be used with a promising perovskite absorber material was identified, and a proper justification for the selection of each material in the device was given.
2) Three theoretical p-i-n type solar cells were made with the chosen materials and appropriate conductive contacts.
This presentation is about the emerging and future possible trends of the exciting field of nanotechnology. Scientists and engineers are working on a smaller scale day-by-day to increase portability and smaller devices, and to change the way we see the world and live in!
Research proposal on organic-inorganic halide perovskite light harvesting mat...Rajan K. Singh
Organic-Inorganic perovskite materials has many applications in the field of opto-electronics such as photo-voltaic cells, LEDs, sensors, memory devices etc. due to its excellent optical and electrical properties. Presence of Pb in such type of perovskite is the biggest challenge for researchers.
Hokkaido University (HU) - Seoul National University (SNU) Joint Symposium
2018 International Workshop on
New Frontiers in Convergence Science and Technology
This presentation showcased first part of our work on graphene-based transistors as our final year project at NIT Patna under guidance of Prof.Wasim akram
Graphene: its increasing economic feasibility Jeffrey Funk
These slides use concepts from my (Jeff Funk) course entitled analyzing hi-tech opportunities to analyze how Graphene is becoming economic feasible for an increasing number of applications as its price falls and its quality/performance rises through improvements in chemical vapor deposition processes. Graphene is one of the strongest materials discovered, has high electronic and thermal conductivities, and unusual optical properties. These slides describe a number of applications for which Graphene is gradually becoming economically feasible including displays, integrated circuits, solar cells, water desalination, and natural gas tanks.
Prof Ong gave a webinar talk on the AI Revolution in Materials Science for the Singapore Agency of Science Technology and Research (A*STAR). In this talk, he discussed the big challenges in materials science where AI can potentially make a huge impact towards addressing as well as outstanding challenges and opportunities to bringing forth the AI revolution to the materials domain.
Perovskites-based Solar Cells: The challenge of material choice for p-i-n per...Akinola Oyedele
Perovskite-based PV have triggered widespread interest in the scientific community because these materials offer the attractive combinations of low cost and theoretically high efficiency. However, several challenges must be overcome for these relatively new PV materials. Among the many important challenges, one is the choice of materials to be used in thin film PV devices..
Based on fundamental principles of solar photovoltaics, this problem focuses on two aspects of the perovskite system:
1) Based on a planar p-i-n device structure, a potential list of p- and n-type charge collecting layers as well as the conductive contacts that could be used with a promising perovskite absorber material was identified, and a proper justification for the selection of each material in the device was given.
2) Three theoretical p-i-n type solar cells were made with the chosen materials and appropriate conductive contacts.
This presentation is about the emerging and future possible trends of the exciting field of nanotechnology. Scientists and engineers are working on a smaller scale day-by-day to increase portability and smaller devices, and to change the way we see the world and live in!
Research proposal on organic-inorganic halide perovskite light harvesting mat...Rajan K. Singh
Organic-Inorganic perovskite materials has many applications in the field of opto-electronics such as photo-voltaic cells, LEDs, sensors, memory devices etc. due to its excellent optical and electrical properties. Presence of Pb in such type of perovskite is the biggest challenge for researchers.
Hokkaido University (HU) - Seoul National University (SNU) Joint Symposium
2018 International Workshop on
New Frontiers in Convergence Science and Technology
This presentation showcased first part of our work on graphene-based transistors as our final year project at NIT Patna under guidance of Prof.Wasim akram
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Presented at OECD Workshop on Systematic Reviews in the Scope of the Endocrine Disrupter Testing and Assessment (EDTA) Conceptual Framework Level 1 in Paris, France
This was part of a webinar from the Materials Research Society on Machine Learning, AI, and Data-Driven Materials Development and Design. The spoken content (including Q&A) is available through MRS.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Open Source Tools for Materials Informatics
1. Open Source Tools for Materials Informatics
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
MRS Fall Meeting 2019
Slides (already) posted to hackingmaterials.lbl.gov
2. Staffing interdisciplinary research
Machine learningMaterials Science
I find a recurring dilemma and asymmetry in
staffing materials informatics research
Materials Informatics
3. 3
Who has a tougher job to get started?
MS&E major CS major
• Already has background in the
material science aspects of the
project
• But needs to learn the
machine learning and
software engineering aspects
• Already has background in
software engineering and
appropriate machine learning
• But needs to learn the
materials science aspects
4. 4
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
5. 5
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
easier to pick up / self-learn
random forests & neural networks
than
phase diagrams & crystal structures
6. 6
There is an asymmetry in resources available
MS&E major CS major
• Hands-on code and examples to
run and modify
• Hundreds of Youtube videos
and online courses
• Code reviews from collaborators
• And the standard books, etc.
• Books and research articles
• Conversations with colleagues,
impromptu lectures
• Practice problems? Worked
examples? Interactive code?
7. Outline
7
①Matminer: data and descriptors for
producing ML structure-property
relationships
② Matscholar – applying natural language
processing to materials science information
retrieval
8. 8
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
How do we get
labeled training
/test data?
How do we know
if our ML model is
extraordinary?
9. 9
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
10. >60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
10
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
11. 11
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we get
labeled training
/test data?
12. • Typically, a lot of attention is given to advanced
algorithms for machine learning
– e.g., deep neural networks versus standard ML
• But perhaps there is not enough emphasis on
developing the appropriate data sets
– with enough information to train ML algorithms
– with sufficient data quality
– easy enough for anyone to at least get started without
specialized knowledge
12
What about data?
13. The importance of data
13
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-
research-and-possibly-the-world/
14. 14
What is ImageNet?
The ImageNet data
set collected and
hand-labeled (e.g.,
via Amazon
Mechanical Turk).
The latest version
has over 14 million
hand-annotated
images, organized
into ~20,000
categories
16. How data stimulates new algorithms
16
How can we create an
ImageNet for materials
science?
17. • We want a test set that contains a diverse array
of problems
– Smaller data versus larger data
– Different applications (electronic, mechanical, etc.)
– Composition-only or structure information available
– Classification or regression
• We also want a cross-validation metric that gives
reliable error estimates
– i.e., less dependent on specific choice of splits
17
An “ImageNet” for materials science
18. 18
Overview of Matbench test set
Target Property Data Source Samples Method
Bulk Modulus Materials Project 10,987 DFT-GGA
Shear Modulus Materials Project 10,987 DFT-GGA
Band Gap Materials Project 106,113 DFT-GGA
Metallicity Materials Project 106,113 DFT-GGA
Band Gap Zhuo et al. [1] 6,354 Experiment
Metallicity Zhuo et al. [1] 6,354 Experiment
Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment
Refractive index Materials Project 4,764 DFPT-GGA
Formation Energy Materials Project 132,752 DFT-GGA
Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA
Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA
Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF
Steel yield strength Citrine Informatics 312 Experiment
1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
19. <1K
1K-10K10K-100K
>100K
19
Diversity of benchmark suite
mechanical
electronic
stability
optical
thermal
classification
regression
experiment
(composition
only)
DFT
(structure)
application data size
problem
type
data type
20. 20
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we know
if our ML model is
extraordinary?
21. 21
How about a benchmark algorithm?
Automatminer is a ”black box” machine learning model
Give it any data set with either composition or structure inputs, and
automatminer will train an ML model (no researcher intervention)
22. 22
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
24. 24
If we can get a well-established “benchmark”, perhaps
interdisciplinary teams can start hammering on accuracy
Today
5years
10years
A lower barrier to entry
in the field means more
ideas can be tested from
more researchers
Matbenchtestset
averageerror
25. 25
Matminer, matbench, and automatminer can all be
accessed, used, and modified by anyone
Code / Examples all on Github
• github.com/hackingmaterials/matminer
• github.com/hackingmaterials/matminer_examples
• github.com/hackingmaterials/automatminer
Matbench data on Figshare
• (coming soon, still finalizing)
Free support via Discourse
• https://discuss.matsci.org
26. Outline
26
① Matminer: data and descriptors for producing
ML structure-property relationships
②Matscholar – applying natural language
processing to materials science information
retrieval
27. We have extracted ~2
million abstracts of
relevant scientific
articles
We use natural
language processing
algorithms to try to
extract knowledge from
all this data
27
Goal: collect and organize knowledge embedded in the
materials science literature
31. • How do we get more people
benefitting from this work
and involved in improving it?
• One solution - expose an
easy-to-use web frontend,
with links to all the backend
codes in case people want to
dive further
– New tools like Plotly Dash
make this easier than ever
31
Using a web site as a “gateway” into the algorithms
frontend
backend
36. • We need more resources to help computer
scientists learn about materials science topics
through hands-on examples and interactive demos
• Some things that can help:
– Open-source implementations of materials science
methods
– Interactive examples (e.g., Jupyter)
– Documentation and support(!)
– Labeled data sets
– Front-ends for easy exploration
36
Concluding thoughts
37. 37
Funding acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• Matminer
– U.S. Department of Energy, Materials Science Division
• Matscholar
– Toyota Research Institutes