The document discusses the Materials Genome Initiative (MGI) and the High-Throughput Experimental Materials Collaboratory (HTE-MC). It describes NIST's role in supporting MGI through developing a materials innovation infrastructure. It outlines the vision for HTE-MC, which would integrate high-throughput synthesis and characterization tools across multiple institutions through a shared network and data management platform. This would provide broader access to experimental facilities and materials data to support accelerated materials discovery. A workshop was held in 2018 to discuss establishing the HTE-MC concept and defining its technical, operational and business models.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
Keynote at Sixth International Workshop on Cloud Data Management CloudDB 2014 Chicago March 31 2014.
Abstract: We introduce the NIST collection of 51 use cases and describe their scope over industry, government and research areas. We look at their structure from several points of view or facets covering problem architecture, analytics kernels, micro-system usage such as flops/bytes, application class (GIS, expectation maximization) and very importantly data source.
We then propose that in many cases it is wise to combine the well known commodity best practice (often Apache) Big Data Stack (with ~120 software subsystems) with high performance computing technologies.
We describe this and give early results based on clustering running with different paradigms.
We identify key layers where HPC Apache integration is particularly important: File systems, Cluster resource management, File and object data management, Inter process and thread communication, Analytics libraries, Workflow and Monitoring.
See
[1] A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures, Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha and Geoffrey Fox, accepted in IEEE BigData 2014, available at: http://arxiv.org/abs/1403.1528
[2] High Performance High Functionality Big Data Software Stack, G Fox, J Qiu and S Jha, in Big Data and Extreme-scale Computing (BDEC), 2014. Fukuoka, Japan. http://grids.ucs.indiana.edu/ptliupages/publications/HPCandApacheBigDataFinal.pdf
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
If Big Data is data that exceeds the processing capacity of conventional systems, thereby necessitating alternative processing measures, we are looking at an essentially technological challenge that IT managers are best equipped to address.
The DCC is currently working with 18 HEIs to support and develop their capabilities in the management of research data and, whilst the aforementioned challenge is not usually core to their expressed concerns, are there particular issues of curation inherent to Big Data that might force a different perspective?
We have some understanding of Big Data from our contacts in the Astronomy and High Energy Physics domains, and the scale and speed of development in Genomics data generation is well known, but the inability to provide sufficient processing capacity is not one of their more frequent complaints.
That’s not to say that Big Science and its Big Data are free of challenges in data curation; only that they are shared with their lesser cousins, where one might say that the real challenge is less one of size than diversity and complexity.
This brief presentation explores those aspects of data curation that go beyond the challenges of processing power but which may lend a broader perspective to the technology selection process.
Automating Data Science over a Human Genomics Knowledge BaseVaticle
# Automating Data Science over a Human Genomics Knowledge Base
Radouane Oudrhiri, the CTO of Eagle Genomics, will talk about how Eagle Genomics is building a platform for automating data science over a human genomics knowledge base. Rad will dive into the architecture Eagle Genomics and also discuss how Grakn serves as the knowledge base foundation of the system. Rad also give a brief history of databases, semantic expressiveness and how Grakn fits in the big picture.
# Radouane Oudrhiri, CTO, Eagle Genomics
Radouane has an extensive experience in leading world-class software and data-intensive system developments in different industries from Telecom to Healthcare, Nuclear, Automotive, Financials. Radouane is Lean/Six Sigma Master Black Belt with speciality in high-tech, IT and Software engineering and he is recognised as the leader and early adaptor of Lean/Six Sigma and DFSS to IT and Software. He is a fellow of the Royal Statistical Society (RSS) and member of the ISO Technical Committee (TC69: Applications of Statistical methods) where he is co-author of the Lean & Six Sigma Standard (ISO 18404) as well as the new standard under development (Design for Six Sigma). He is also part of the newly formed international Group on Big Data (nominated by BSI as the UK representative/expert). Radouane has also been Chair of the working group on Measurement Systems for Automated Processes/Systems within the ISPE (International Society for Pharmaceutical Engineering).
Australia's Environmental Predictive CapabilityTERN Australia
Federating world-leading research, data and technical capabilities to create Australia’s National Environmental Prediction System (NEPS).
Community consultation presentation.
3-12 February 2020
Dr Michelle Barker (Facilitator)
(Presentation v5)
Summary of June 2014 Workshop Report: Building a Materials Accelerator NetworkSusann Ely
Summary of June 2014 Workshop Report: Building a Materials Accelerator Network. Presented by Prof. Dave McDowell, Executive Director, GA Tech Institute for Materials. Presented at the UMC Meeting, MS&T 2015. Oct. 7, 2015
AHM 2014: Enterprise Architecture for Transformative Research and Collaborati...EarthCube
Ilya Zaslavsky, David Valentine, Amarnath Gupta, Stephen Richard, Tanu Malik
Presentation given in the afternoon Architecture Forum Session on Day 1, June 24 at the EarthCube All-Hands Meeting
Presentation on the work we've done within BeSTGRID as it relates to bioinformatics in NZ, for the 2010 Bioinformatics Symposium https://www.bestgrid.org/NZ-Bioinformatics-Symposium-2010
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
Presented at The Hawaii International Conference on System Sciences by Hong-Mei Chen and Rick Kazman (University of Hawaii), Serge Haziyev (SoftServe).
Green Shoots:Research Data Management Pilot at Imperial College LondonTorsten Reimer
This presentation by Ian McArdle and Torsten Reimer was given at the 10th International Digital Curation Conference in London (10th February 2015). It describes a "Green Shoots" research data management pilot programme at Imperial College London.
Similar to Hattrick Simpers TMS Machine Learning Workshop Slides (20)
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
1. The MGI & Data-driven High-
Throughput Synthesis and
Characterization
Brian DeCost, Zachary Trautt, Martin Green, Gilad Kusne,
Jason Hattrick-Simpers
NIST Gaithersburg
Jason.Hattrick-Simpers@nist.gov
@jae3goals
Any mention of commercial products within this talk is for information only; it does not imply recommendation or
endorsement by NIST.
2. Outline
• The Materials Genome Initiative (MGI) and NIST’s Role
• The High-Throughput Experimental Materials Collaboratory (HTE-MC)
• Accelerated Discovery of (High – Hardness & Corrosion Resistant)
Metallic Glasses
• Iterative HTE and AI
• Vision for the Future
• Look Ma No Hands (Experimentation)!!
• Conclusions
3. Decrease time-to-market by 50% while <<$$
• Develop a Materials Innovation
Infrastructure
• Achieve National goals in energy,
security, and human welfare with
advanced materials
• Equip the next generation of
materials workforce
Materials Genome Initiative for
Global Competitiveness
9. Examples of Cultural Implementation and
Successes of the MGI
• Argonne Collaboration – phase identification at aluminum interfaces
• Lund Boats – MGI on the plant floor
• Casting Simulation (MAGMA) – MGI in R&D, tool shop, & plant floor
• Timken Steel – Premium Air Melt Practice, putting premium quality,
cost conscious steel into the hands of our customers
• BASF – Foaming simulations based on first principles
• ERCo – Laser Induced Breakdown Spectroscopy for real-time melt
composition (ARPA-E)
10.
11. Standards Are Important
• The NIST MGI Program is taking a very careful approach to consensus
standards for data representation
• There is a long track record of failure for most of the space
• Exception for high structured data (e.g. ICSD)
• This should be done top-down not bottoms-up
12. MGI Directions to Date
Materials by Design
projects:
DOE EFRCs, EMNs
NSF DMREFs
HT computational
databases:
Need: High-throughput
experimental data
13.
14. Workshop: “Fulfilling the Promise of the Materials
Genome Initiative via High-Throughput
Experimentation” – 2014
15. Workshop Conclusions
A large portion of the MGI program thus far has been devoted to modeling
and simulation. Prodigious amounts of experimental data will be required to
inform and validate modeling and simulation, to “power the MGI
computational engine.”
HTE can rapidly establish relationships between composition, structure,
and properties for a wide variety of materials classes, and therefore is:
a) uniquely suited to rapidly generate high quality, consistent data
sets
b) the key enabling counterpart to modeling and simulation for
bringing the MGI to fruition
“Enable broad access to HTE methodologies and data”
16.
17. High Throughput Experimental Materials
Collaboratory (HTE-MC)
• Necessary because even on “brick and mortar” HTE facility would be
very costly, and multiple facilities dedicated to different materials
classes (e.g. catatlysts, photovoltaics, lightweight structural materials,
etc.) are needed
• Enable researchers at national laboratories, universities, and industry
to have access to HTE facilities
• The HTE-MC would facilitate MGI-driven research while leveraging
investment
• Complement new science investments (EMN’s, NNMI, MURI, etc)
18. How?
• Collaboratory: a 1989 neologism (William A. Wulf, Computer Scientist
at University of Virginia):
“defined by… a center without walls, ‘in which the nation’s
researchers can perform their research without regard to physical
locations, interacting with colleaues, accessing instrumentation,
sharing data and computational resources, … accessing information in
digital librarires
• A HTE-MC would consist of:
• An integrated, delocalized network of high-throughput synthesis and
characterization tools
• A best-in-class materials data management platform, consisting of NIST (and
other) software
19. HTE-MC 1st Steps: NIST – NREL Round Robin
Sample synthesis and measurements:
• Synthesize: Zn-Sn-Ti-O composition spread
sample libraries using combinatorial PLD
(@NIST) or sputtering (@NREL)
• Measure: Chemical composition, Crystal
structure, Electrical conductivity, Optical
transmittance, Band gap
• Exchange: Sample libraries and associated
data, repeat measurements
Zn-Sn-Ti-O:
• Chemical composition
• Crystal structure
• Electrical conductivity
• Optical transmittance
• Work function
Goal: test and improve the standards for exchange of data and sample among participant labs
NREL Samples NIST Sample
20. Addressing FAIR Principles
To be Findable:
• (meta)data are assigned a globally unique and
persistent identifier
• data are described with rich metadata
• metadata clearly and explicitly include the identifier
of the data it describes
• (meta)data are registered or indexed in a searchable
resource
To be Accessible:
• (meta)data are retrievable by their identifier using a
standardized communications protocol
– the protocol is open, free, and universally
implementable
– the protocol allows for an authentication and
authorization procedure, where necessary
• metadata are accessible, even when the data are no
longer available
To be Interoperable:
• (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
• (meta)data use vocabularies that follow FAIR
principles
• (meta)data include qualified references to other
(meta)data
To be Reusable:
• meta(data) are richly described with a plurality of
accurate and relevant attributes
– (meta)data are released with a clear and accessible data
usage license
– (meta)data are associated with detailed provenance
– (meta)data meet domain-relevant community standards
Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data
management and stewardship." Scientific data 3 (2016). DOI:
10.1038/sdata.2016.18
21. HTE-MCGOVERNMENT
AGENCIES
MEMBERS
• Academia
• National Labs
• Industry
• Small Business
Provide
Students/Staff
Receive
Funding $Provide Structural
Funding
Provide Science
Infrastructure
USERS
• Industry
• Small Business
• Academia
• National Labs
• Manufacturing
USA Institutes
• Energy Materials
Networks
Pay Tiered
Access Fees
$
$
Generate
New Data
CONTRIBUTORS
• Academia
• National Labs
• HTE-MC Users
(after embargo period)
Receive
Benefits
Publish Open-
Access Data
VISITORS / PUBLIC
• Industry
• Small Business
• Academia
• Educators
• National Labs
• Manufacturing
USA Institutes
• Energy Materials
Networks
Access AI-ready
Public Data
Next Generation
Workforce
New
Knowledge
Materials
Solutions
+1
Provide Data
Infrastructure
22. HTE Materials Collaboratory
Problems
• Experimental databases
are not keeping pace with
computational databases
• HTE is out of reach to most
due to high startup and
operating costs
• Materials are diverse; no
single institution can have
all the necessary
equipment
Solution
• Integrate HTE laboratories
with materials
cyberinfrastructure
• HTE as a shared resource;
operate on demand by
access fees and core
funding
• HTE as a federated
resource; enable
connectivity via
cyberinfrastructure
23. • Member
• Provides infrastructure
• User
• Utilizes infrastructure
• Creates new data
• May choose to
publish data
• Contributor
• Publishes data
• Visitor
• Consumes public data
Technical Stakeholder Types and Population
Visitors
Contributors
Users
Members
(defines action, not access)
25. HTE-MC
Member Institute
Laboratory Information
Management System
Data Transfer Grid
Instruments/Computing
Database / Structured
Data / Metadata
File/Collection Repository
Member Institute
Laboratory Information
Management System
Instruments/Computing
File/Collection Repository
Data Dissemination
Data Transfer Grid
Database / Structured
Data / Metadata
File/Collection Repository
Registries
Materials
Resource Registry
High-Throughput
Experiment
Resource Registry
Member Institute
User Institute
Data Transfer Grid
Laboratory Information
Management System
Data Transfer Grid
Instruments/Computing
Database / Structured
Data / Metadata
File/Collection Repository
Data Transfer Grid
Database / Structured
Data / Metadata
26. High-Throughput Experimental Materials
Collaboratory (HTE-MC) Workshop
• Held: February 2018
• Workshop Goals:
• Socialize the HTE-MC concept among government, academic and industry stakeholders
• Expand HTE-MC membership
• Define technical, operational and business models for the HTE-MC
• Facilitated Breakout Sessions:
• Define the Vision of HTE-MC
• Define the value proposition for participation
• Identify major barriers to successful participation
• Identify and prioritize pilot use cases
• Identify and describe modes of interaction of users
• Define governance and business models for HTE-MC
• Workshop Report: In preparation
27. A Multi-Agency, Multi-Year Program Plan in
Advanced Energy Materials Discovery,
Development, and Process Design
• Held July 2018
• Workshop Goals
• Determine how best to coordinate next steps within the Federal Government
• Efficiently leverage the ongoing research in advanced materials conducted in
academia, industry, and government research laboratories
• Facilitated Breakout Sessions:
• Priorities in Energy Materials R&D: Barriers, Timeline, and Metrics
• Database infrastructure needs in AI and Energy Materials R&D: Moving Materials
Discovery through Materials Processes
• Expansion of the Collaboratory Network for Energy Materials Discovery and Process
Design
• Integration of AI, ML, and Experimentation for Energy Materials Design and
Processing
• Workshop Report: In preparation
28. Iterative Machine Learning – High
Throughput Experimental Approach to
Discovering Novel Amorphous Alloys
Fang Ren1, Logan Ward2, Travis Williams3, Kevin J. Laws4,
Christopher M. Wolverton2, Jason Hattrick-Simpers5, Apurva Mehta1
1SLAC National Accelerator Laboratory, 2Northwestern University, 3University of South Carolina,
4UNSW Australia, 5National Institute of Standards and Technology, 6 University of Chicago
Science Advances, Vol 4 No. 4 (2018)
30. Metallic Glasses Are Interesting
http://vitreloy.caltech.edu/development.htm
West US 7998286 B2
E Ma. Nature Materials. 14, 2015.
Metallic glass (MG) is a solid
metallic material, usually an
alloy, with a disordered atomic-
scale structure (amorphous).
31. The Palette of Potential Metallic Glasses
Usually Contain 3 or more elements
30 non-toxic, earth friendly elements > 4000 ternaries, > 4 Million compositions
32.
33. Building the Machine Learning Model
Ref: Ward et al. npj Comp. Mater. (2016), 28.
Experimental
Data
Machine Learning
Algorithm
Composition-based
Representation
𝜎𝑟 < 1.1 Å
MG Not MG
𝜇 𝑍 ΔΧ
𝜎 𝑇 𝑚 max 𝑟𝑐𝑜𝑣
𝑥 𝐻, 𝑥 𝐻𝑒, … 2
𝑮𝑭𝑨 = 𝒇(𝒙 𝑯, 𝒙 𝑯𝒆, … )
24 Million Ternary Alloys
74520 potential MGs
5739 measurements
145 Attributes
Random Forest
34. Select Experiments that Involve Contradiction
Selection Criteria
1.) None of the models 100% disagree
2.) Some experimental data existed
3.) Inexpensive, low vapor pressure materials
Yang Model
Efficient
Packing Model ML Predictions
41. Case Example X-Y-Al: Breaking from
Convention AND Property Prediction
No “deep” eutectics necessary!
Massalski “Binary Alloy Phase Diagrams” (1990)
42. But How to Create Property Models?
• There is no L-B-type data set for
properties of MG
• NLP/data extraction from
figures is in its infancy
• Manually scrape the literature
• 2000+ entries
• Errant measurements
• Many different groups
• Inconsistent definition of
“amorphous”
Feature Importance
Average Ground State
Volume
0.37
Minimum Ground State
Volume
0.24
Minimum Covalent Radius
0.12
Mean Melting
Temperature
0.036
Highest Melting
Temperature
0.017
46. “In the next 5 years, AI-driven, autonomous
materials research is going to fundamentally
change how we do materials science.”
-Jim Warren, Technical Program Director for
Materials Genomics, NIST
50. Active clustering for autonomous XRD phase
mapping
Think carefully about modeling to remove researcher degrees of freedom
DeCost, et. al., to be submitted
51. Conclusions
• AI & ML are already prevalent in the design of new materials, materials
synthesis, data capture/cleaning and knowledge extraction
• Neither AI nor ML are a panacea that will replace human intuition and
creativity, they are enablers
• In some cases an order of magnitude increase in materials
exploration/discovery is possible
• Maybe a fairer metric of AI’s influence will be on the rate of hypothesis
generation and (in)validation
• AI needs FAIR data including negative results to be effective
• Not part of the solution = consigned to obscurity
• Full materials research autonomy (for specific problems) has already been
demonstrated
52. Acknowledgements
USC
Travis Williams
SLAC
Dr. Apurva Mehta
Dr. Fang Ren
Dr. Suchismita
Northwestern
Prof. Wolverton
Dr. Logan Ward
UNSW
Prof. Kevin Laws
NIST
Dr. James Warren
Dr. Martin Green
Dr. Zachary Trautt
Dr. Gilad Kusne
Dr. Brian DeCost
Mr. Ryan Smith
NREL
Dr. Andriy
Zakutayev
CSM
Prof. Packard
Dr. Schoeppner
53. Demonstrations and Talks by (confirmed speakers):
• Theory
• Computational Approaches
• Experimental Approaches
Andrew Millis (Columbia)
Antoine Georges (CCQ)
Karin Rabe (Rutgers)
Bootcamp: Machine Learning for Materials Research &
Workshop: Machine Learning Quantum Materials
• Dates: July 30 – Aug 3, 2018
• Location: IBBR (Gaithersburg, Maryland)
MLMR Introduces researchers from industry, national labs, and academia to machine learning theory and tools for rapid data analysis.
https://nanocenter.umd.edu/events/mlmr/
Bootcamp
Three days of lectures and hands-on exercises covering a range of
data analysis topics from data pre-processing through advanced
machine learning analysis techniques. Example topics include:
• Identifying important features in complex/high dimensional
data
• Visualizing high dimensional data to facilitate user analysis.
• Identifying the fabrication ‘descriptors’ that best predict
variance in functional properties.
• Quantifying similarities between materials using complex/high
dimensional data
The hands-on exercises will demonstrate practical use of machine
learning tools on real materials data (scalar values, spectra,
micrographs, etc.
Sasha Balatsky (LANL)
Roger Melko (Waterloo)
Shoucheng Zhang (Stanford)
Stefano Curtarolo (Duke)
Gus Hart (BYU)
Ichiro Takeuchi (UMD)
Sergei Kalinin (ORNL)
Benji Maruyama (AFRL)
Jiun-Haw Chu (Univ. Washington)
Giuseppe Carleo (Flatiron)
Miles Soudenmire (Flatiron)
Editor's Notes
I think we have a great opportunity for you to give attendees an overview of your work in data-driven HT synthesis and characterization. You should also feel free to provide forward-looking vision, e.g., if you'd like to highlight the emerging HT collaboratory concept led by NIST. Finally, the audience may also find it interesting to hear a bit of introductory content about NIST's role in MGI and materials data broadly.
Old story – how do we combine experiment, computation, and digital data to develop the materials that fit critical needs but do so cheaper and faster than ever before?
Gist:
We can’t forget that this is about the full discovery to deployment cycle, it doesn’t serve our purposes to compartmentalize and only focus on independent material discovery but to consider how it will eventually move into application.
MGI ideas aren’t necessarily new, but are following a natural progression that began in 1988 with COTA.
The idea is that through computation-guided experimentation we can achieve our goals more quickly than through only experimentation.
The emphasis is that this was started as a multi-agency initiative with coordination through the agencies but with each agency taking their own approach to implementation.
Materials are complex, multiple length scales are important. We use simulations to look up length scales and experiments to look down. These both generate and consume data that is used to inform models which generate data and inform the exp-sim loop. Outside of this loop we would like to arrive at new and outstanding materials. The data and the models can live anywhere, in an ideal somewhere FAIR, but in reality is scattered between notebooks and hard drives and someone’s memory.
When this method of producing materials works, it can be powerful.
Alloys designed by Apple using Questech IP – centered around ICME/MGI technique for materials design and deployment.
So let’s take the idea from the previous slide, abstract it a bit, and ask where does NIST fit into this MGI equation?
DOC’s smiling face towards industry. If we think that there are hundreds (thousands) of such MGI loops in the country all producing data and models, then NIST’s fit is clear.
First of all, we have to help industry, academia, and government labs exchange data. We can help set up repositories but first we have to ask what are meaningful (standard) ways of interchanging materials data from disparate sources?
Secondly, NIST is measurement technology driven and UNCERTAINTY and QUALITY assessment and improvement are key directives in this space.
But we are talking about materials data and models in repositories and a key question remains, “where are the curated, homogeneous and high-quality materials data for model development and validation coming from?”
This is a big problem within the MGI, because a great deal of effort has gone to the top half of the Venn diagram. Our (and a number of other’s) contention is that HT experimentation is the potential driving force for the MGI engine.
This started with a review paper by Marty, Ichiro and myself talking about how HTE has really revolutionized the way people search for and optimize new materials. This caught the attention of OSTP White House and we were asked to organize a workshop bringing together some of the best in the field.
This slide is just about one of the outcomes from the workshop (held in San Fran in May 2014)
Can we turn me, the high-throughput experimentalist, into the rate limiting step in an intelligent search for new amorphous alloys?
Emphasize this is a moonshot, but that my off ramp is in the field of coatings.
Meshing is important but a reasonably dense sampling would take
~1000 years bulk alloy (5/day)
~10 years via HTE alone
~2 years
Start at the bottom of this image and work my way clockwise.
Stocihiometic attributes capture the fraction but not type of elements present.
Elemental property attributes of atomic row, mendeev number, atomic weight, total # of unfilled states, etc. with both weighted averages, max, min, range average deviation and mode
Calence orbital occupation attributes
Ionic compound attributes…
Ask before getting into this, if anyone isn’t familiar with roughly how Random Forest works.
Left melt-spun model, right stacked model
Main data set is unbalanced
The relationship between the liquidus, FWHM, and GFR, shown in fig. S7, suggests a strong correlation between glass formation and the C15 (MgCu2 prototype) Lave and B2 liquidus phase fields common to all four systems. These results indicate that these particular ordered phases are difficult to crystallize quickly, resulting in glass formation. For instance, for the Co-Zr–containing ternaries, despite the ZrCo2 C15 Lave phase having a high melting point relative to surrounding phases, the exceptional correlation between the GFR and the ZrCo2 liquidus phase field as it extends into the ternary composition space suggests a high kinetic barrier to crystallization. These correlations further suggest that large mismatch in ionic sizes and the presence of larger atoms in these structures, such as Zr, hinders crystallization more so.
What do the circle and/or the box mean?
MAE 9.2 Mpa
MRE 10%
Here are some thoughts that I have:
1.) how much of the scatter is due to repeats for a given entry?
2.)
If you’re interested in machine learning, we have an annual bootcamp at University of Maryland that teaches a wide variety of these techniques. You’ll learn things like how to identify important features in your data, how to visualize complex or high dimensional data, and how to identify descriptors.
Each morning there are lectures and the afternoons are hands on activities applying machine learning to real materials data.
And Ichiro Takeuchi, me and some collaborators have also organized an annual bootcamp.
At this bootcamp we teach an introduction to machine learning, the most common techniques. Half of each day is also hands on training where you learn how to write code to analyze real data. Some examples of stuff we teach. How to …