The document discusses using machine learning to accelerate materials discovery. Specifically:
- Scientists developed a system combining machine learning algorithms trained on experimental data with high-throughput experiments to discover new metallic glass alloys 200 times faster than before.
- The system uses machine learning models to predict optimal new material compositions and processing parameters based on large datasets of materials properties and compositions.
- As an example, the document discusses using random forest machine learning on a dataset of 2722 hydrogen storage alloy compositions and properties to predict promising new alloy compositions for hydrogen storage applications.
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
Materials Data in the 21st Century: From Mishmash to Moneyballbmeredig
The scientific method consists of generating and analyzing data to create knowledge. Indeed, every materials scientist uses data from syntheses, characterization, and models to explain and optimize materials behavior. Yet, despite the centrality of data to progress in materials, the world’s immense body of materials data remains unstandardized, unstructured, and trapped in myriad publications, isolated repositories, and private computers. This disaggregation (the mishmash) not only prevents materials scientists from standing on the shoulders of giants, but also limits our ability to use large-scale data analytics to dramatically accelerate materials modeling, discovery, and manufacture (à la Moneyball).
Citrine Informatics is a team of materials scientists dedicated to uniting all materials data on a single platform within a single data standard, and putting user-friendly, data-driven tools into the hands of all materials researchers. The company’s vision is to make the full materials R&D pipeline—from initial discovery to scale-up and commercialization—ten times faster than it is today. In this talk, we will review the present state of affairs in materials data, notable progress to date, opportunities for the future, and the challenges likely to arise along the way.
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Graph Centric Analysis of Road Network Patterns for CBD’s of Metropolitan Cit...Punit Sharnagat
OSMnx is a Python package to retrieve, model, analyze, and visualize street networks from OpenStreetMap.
OpenStreetMap (OSM) is a collaborative mapping project that provides a free and publicly editable map of the world.
OpenStreetMap provides a valuable crowd-sourced database of raw geospatial data for constructing models of urban street networks for scientific analysis
Materials Data in the 21st Century: From Mishmash to Moneyballbmeredig
The scientific method consists of generating and analyzing data to create knowledge. Indeed, every materials scientist uses data from syntheses, characterization, and models to explain and optimize materials behavior. Yet, despite the centrality of data to progress in materials, the world’s immense body of materials data remains unstandardized, unstructured, and trapped in myriad publications, isolated repositories, and private computers. This disaggregation (the mishmash) not only prevents materials scientists from standing on the shoulders of giants, but also limits our ability to use large-scale data analytics to dramatically accelerate materials modeling, discovery, and manufacture (à la Moneyball).
Citrine Informatics is a team of materials scientists dedicated to uniting all materials data on a single platform within a single data standard, and putting user-friendly, data-driven tools into the hands of all materials researchers. The company’s vision is to make the full materials R&D pipeline—from initial discovery to scale-up and commercialization—ten times faster than it is today. In this talk, we will review the present state of affairs in materials data, notable progress to date, opportunities for the future, and the challenges likely to arise along the way.
The new IMI Labs service bridges this gap,
opening up the IMI high-throughput
experimentation platform, materials expertise
and analytics to the industry to accelerate and
de-risk the exploration, discovery,
characterization and selection of advanced
materials
The new IMI Labs service bridges this gap,
opening up the IMI high-throughput
experimentation platform, materials expertise
and analytics to the industry to accelerate and
de-risk the exploration, discovery,
characterization and selection of advanced
materials
Materials Data Facility as Community Database to Share Nano-manufacturing Rec...Globus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Ben Galewsky from the National Center for Supercomputing Applications (NCSA).
In this deck from the HPC User Forum, Rick Stevens from Argonne presents: AI for Science.
"Artificial Intelligence (AI) is making strides in transforming how we live. From the tech industry embracing AI as the most important technology for the 21st century to governments around the world growing efforts in AI, initiatives are rapidly emerging in the space. In sync with these emerging initiatives including U.S. Department of Energy efforts, Argonne has launched an “AI for Science” initiative aimed at accelerating the development and adoption of AI approaches in scientific and engineering domains with the goal to accelerate research and development breakthroughs in energy, basic science, medicine, and national security, especially where we have significant volumes of data and relatively less developed theory. AI methods allow us to discover patterns in data that can lead to experimental hypotheses and thus link data driven methods to new experiments and new understanding."
Watch the video: https://wp.me/p3RLHQ-kQi
Learn more: https://www.anl.gov/topic/science-technology/artificial-intelligence
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this slidecast, Jason Stowe from Cycle Computing describes the company's recent record-breaking Petascale CycleCloud HPC production run.
"For this big workload, a 156,314-core CycleCloud behemoth spanning 8 AWS regions, totaling 1.21 petaFLOPS (RPeak, not RMax) of aggregate compute power, to simulate 205,000 materials, crunched 264 compute years in only 18 hours. Thanks to Cycle's software and Amazon's Spot Instances, a supercomputing environment worth $68M if you had bought it, ran 2.3 Million hours of material science, approximately 264 compute-years, of simulation in only 18 hours, cost only $33,000, or $0.16 per molecule."
Learn more: http://blog.cyclecomputing.com/2013/11/back-to-the-future-121-petaflopsrpeak-156000-core-cyclecloud-hpc-runs-264-years-of-materials-science.html
Watch the video presentation: http://wp.me/p3RLHQ-aO9
Green Shoots:Research Data Management Pilot at Imperial College LondonTorsten Reimer
This presentation by Ian McArdle and Torsten Reimer was given at the 10th International Digital Curation Conference in London (10th February 2015). It describes a "Green Shoots" research data management pilot programme at Imperial College London.
Growth Opportunities in Aeronautical and Mechanical Engineering By ATOA Scientific Technologies, Multiphysics CAE simulation solution provider, Future of Aeronautical and Mechanical Engineering
Accelerating Discovery via Science ServicesIan Foster
[A talk presented at Oak Ridge National Laboratory on October 15, 2015]
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Lateral Ventricles.pdf very easy good diagrams comprehensive
ML in materials discovery
1. ML in Materials Discovery
Brian DeCost, Zachary Trautt, Martin Green, Gilad Kusne,
Jason Hattrick-Simpers
NIST Gaithersburg
Jason.Hattrick-Simpers@nist.gov
@jae3goals
2.
3. The Time is Now for AI
Head Spinning AI
Break-Throughs
AI is
Accessible
(Scientific)
Data is
Available
4. The system they developed combines machine
learning – a form of artificial intelligence where
computer algorithms glean knowledge by
ingesting enormous amounts of data -- with
experiments that quickly make and screen
hundreds of sample materials at a time. This
allowed them to discover three new blends of
ingredients that form metallic glass 200 times
faster than it could be done before
Scientists Use Machine
Learning to Speed Discovery of
Metallic Glass
5. Beware of the AI/ML Hyperbole!
• Did Google guess my last name
with just my picture????
• Clearly No!!
• ML models are often interpolative
and correlative but our
INTERPRETATIONS can build in
causation that doesn’t exist.
6. Many Technologies Await Materials Solutions
Corrosion
Resistant Materials
• 3% of US GDP
• Impacts safety
Gas/Steam Turbines
• New higher
temperature alloys
Clean Energy
• Hydrogen
storage/delivery
• Higher efficiency and
selectivity CO2 catalysts
Additive
Manufacturing
• AM – Inconel 718 is
not Inconel 718
• Difficult to process
Using AI to Determine Optimal New Materials and Process Parameters
7. Hydrogen: The Fuel of the Future
• Hydrogen is an energy carrier
• Explored for vehicles since 70’s
• Advantages
• Abundant
• Produces energy cleanly
• Huge gravimetric energy density
• Disadvantages
• Mostly found in H2O
• Low volumetric energy density
• Lack of delivery infrastructure
10. Hydrogen Storage Alloys: Ahead of Their Time
• Diverse storage methods
• Adsorption, interstitial, chemical,
etc.
• Original Hydpark Dataset
contains 2722 materials
• The original high entropy alloys
(HEAs)
• LaNi5
• La0.4Ce0.2Ca0.5Ni3.55Co0.75Mn0.4Al0.3
11. Building the Machine Learning Model
Ref: Ward et al. npj Comp. Mater. (2016), 28.
Experimental
Data
Machine Learning
Algorithm
Composition-based
Representation
𝜎𝑟 < 1.1 Å
MG Not MG
𝜇 𝑍 ΔΧ
𝜎 𝑇 𝑚 max 𝑟𝑐𝑜𝑣
𝑥 𝐻, 𝑥 𝐻𝑒, … 2
𝑮𝑭𝑨 = 𝒇(𝒙 𝑯, 𝒙 𝑯𝒆, … )
24 Million Ternary Alloys
74520 potential MGs
5739 measurements
145 Attributes
Random Forest
12. What is a Random Forest?
Simple Decision Tree
http://blog.citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics
13. What is a Random Forest?
http://blog.citizennet.com/blog/2012/11/10/random-forests-ensembles-and-performance-metrics
Editor's Notes
Emphasize use of ML in my work but rampant skepticism.
Stocihiometic attributes capture the fraction but not type of elements present.
Elemental property attributes of atomic row, mendeev number, atomic weight, total # of unfilled states, etc. with both weighted averages, max, min, range average deviation and mode
Calence orbital occupation attributes
Ionic compound attributes…
Ask before getting into this, if anyone isn’t familiar with roughly how Random Forest works.