We presented these slides at the NIH Data Commons kickoff meeting, showing some of the technologies that we propose to integrate in our "full stack" pilot.
Deep learning is finding applications in science such as predicting material properties. DLHub is being developed to facilitate sharing of deep learning models, data, and code for science. It will collect, publish, serve, and enable retraining of models on new data. This will help address challenges of applying deep learning to science like accessing relevant resources and integrating models into workflows. The goal is to deliver deep learning capabilities to thousands of scientists through software for managing data, models and workflows.
Plenary talk at the international Synchrotron Radiation Instrumentation conference in Taiwan, on work with great colleagues Ben Blaiszik, Ryan Chard, Logan Ward, and others.
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. I present here three projects that use cloud-hosted data automation and enrichment services, institutional computing resources, and high- performance computing facilities to provide cost-effective, scalable, and reliable implementations of such processes. In the first, Globus cloud-hosted data automation services are used to implement data capture, distribution, and analysis workflows for Advanced Photon Source and Advanced Light Source beamlines, leveraging institutional storage and computing. In the second, such services are combined with cloud-hosted data indexing and institutional storage to create a collaborative data publication, indexing, and discovery service, the Materials Data Facility (MDF), built to support a host of informatics applications in materials science. The third integrates components of the previous two projects with machine learning capabilities provided by the Data and Learning Hub for science (DLHub) to enable on-demand access to machine learning models from light source data capture and analysis workflows, and provides simplified interfaces to train new models on data from sources such as MDF on leadership scale computing resources. I draw conclusions about best practices for building next-generation data automation systems for future light sources.
Research Automationfor Data-Driven DiscoveryGlobus
This document discusses research automation and data-driven discovery. It notes that data volumes are growing much faster than computational power, creating a productivity crisis in research. However, most labs have limited resources to handle these large data volumes. The document proposes applying lessons from industry to create cloud-based science services with standardized APIs that can automate and outsource common tasks like data transfer, sharing, publishing, and searching. This would help scientists focus on their core research instead of computational infrastructure. Examples of existing services from Globus and the Materials Data Facility are presented. The goal is to establish robust, scalable, and persistent cloud platforms to help address the challenges of data-driven scientific discovery.
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Ben Blaiszik from University of Chicago and Argonne National Laboratory Data Science and Learning Division.
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
Reviews recent results from the Materials Data Facility. Thanks in particular to Ben Blaiszik, Jonathon Goff, and Logan Ward, and the Globus data search team. Some features shown here are still in beta. We are grateful for NIST for their support.
Accelerating Discovery via Science ServicesIan Foster
[A talk presented at Oak Ridge National Laboratory on October 15, 2015]
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
1) Scientists at the Advanced Photon Source use the Argonne Leadership Computing Facility for data reconstruction and analysis from experimental facilities in real-time or near real-time. This provides feedback during experiments.
2) Using the Swift parallel scripting language and ALCF supercomputers like Mira, scientists can process terabytes of data from experiments in minutes rather than hours or days. This enables errors to be detected and addressed during experiments.
3) Key applications discussed include near-field high-energy X-ray diffraction microscopy, X-ray nano/microtomography, and determining crystal structures from diffuse scattering images through simulation and optimization. The workflows developed provide significant time savings and improved experimental outcomes.
We presented these slides at the NIH Data Commons kickoff meeting, showing some of the technologies that we propose to integrate in our "full stack" pilot.
Deep learning is finding applications in science such as predicting material properties. DLHub is being developed to facilitate sharing of deep learning models, data, and code for science. It will collect, publish, serve, and enable retraining of models on new data. This will help address challenges of applying deep learning to science like accessing relevant resources and integrating models into workflows. The goal is to deliver deep learning capabilities to thousands of scientists through software for managing data, models and workflows.
Plenary talk at the international Synchrotron Radiation Instrumentation conference in Taiwan, on work with great colleagues Ben Blaiszik, Ryan Chard, Logan Ward, and others.
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. I present here three projects that use cloud-hosted data automation and enrichment services, institutional computing resources, and high- performance computing facilities to provide cost-effective, scalable, and reliable implementations of such processes. In the first, Globus cloud-hosted data automation services are used to implement data capture, distribution, and analysis workflows for Advanced Photon Source and Advanced Light Source beamlines, leveraging institutional storage and computing. In the second, such services are combined with cloud-hosted data indexing and institutional storage to create a collaborative data publication, indexing, and discovery service, the Materials Data Facility (MDF), built to support a host of informatics applications in materials science. The third integrates components of the previous two projects with machine learning capabilities provided by the Data and Learning Hub for science (DLHub) to enable on-demand access to machine learning models from light source data capture and analysis workflows, and provides simplified interfaces to train new models on data from sources such as MDF on leadership scale computing resources. I draw conclusions about best practices for building next-generation data automation systems for future light sources.
Research Automationfor Data-Driven DiscoveryGlobus
This document discusses research automation and data-driven discovery. It notes that data volumes are growing much faster than computational power, creating a productivity crisis in research. However, most labs have limited resources to handle these large data volumes. The document proposes applying lessons from industry to create cloud-based science services with standardized APIs that can automate and outsource common tasks like data transfer, sharing, publishing, and searching. This would help scientists focus on their core research instead of computational infrastructure. Examples of existing services from Globus and the Materials Data Facility are presented. The goal is to establish robust, scalable, and persistent cloud platforms to help address the challenges of data-driven scientific discovery.
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Ben Blaiszik from University of Chicago and Argonne National Laboratory Data Science and Learning Division.
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
Reviews recent results from the Materials Data Facility. Thanks in particular to Ben Blaiszik, Jonathon Goff, and Logan Ward, and the Globus data search team. Some features shown here are still in beta. We are grateful for NIST for their support.
Accelerating Discovery via Science ServicesIan Foster
[A talk presented at Oak Ridge National Laboratory on October 15, 2015]
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
1) Scientists at the Advanced Photon Source use the Argonne Leadership Computing Facility for data reconstruction and analysis from experimental facilities in real-time or near real-time. This provides feedback during experiments.
2) Using the Swift parallel scripting language and ALCF supercomputers like Mira, scientists can process terabytes of data from experiments in minutes rather than hours or days. This enables errors to be detected and addressed during experiments.
3) Key applications discussed include near-field high-energy X-ray diffraction microscopy, X-ray nano/microtomography, and determining crystal structures from diffuse scattering images through simulation and optimization. The workflows developed provide significant time savings and improved experimental outcomes.
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Accelerating Data-driven Discovery in Energy ScienceIan Foster
A talk given at the US Department of Energy, covering our work on research data management and analysis. Three themes:
(1) Eliminate data friction (use of SaaS for research data management)
(2) Liberate scientific data (research on data extraction, organization, publication)
(3) Create discovery engines at DOE facilities (services that organize data + computation)
My talk at the Winter School on Big Data in Tarragona, Spain.
Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.
The document discusses using artificial intelligence (AI) to accelerate materials innovation for clean energy applications. It outlines six elements needed for a Materials Acceleration Platform: 1) automated experimentation, 2) AI for materials discovery, 3) modular robotics for synthesis and characterization, 4) computational methods for inverse design, 5) bridging simulation length and time scales, and 6) data infrastructure. Examples of opportunities include using AI to bridge simulation scales, assist complex measurements, and enable automated materials design. The document argues that a cohesive infrastructure is needed to make effective use of AI, data, computation, and experiments for materials science.
This document describes two case studies of health and status monitoring systems used to monitor large, complex datasets and detect anomalies. In the first case study, a system monitored thousands of servers in a data center and detected dead or slow nodes that reduced overall performance. The second case study monitored billions of payment card transactions and developed over 15,500 statistical models to detect data quality issues and interoperability problems, improving approval rates and saving millions. Both cases highlighted the importance of executive support, dashboards, governance programs, and developing numerous statistical models tailored to different data segments.
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
At eScience in the Cloud 2014, Redmond WA, April 30 2014
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations.
We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on combining HPC and the Apache software stack that is well used in modern cloud computing.
Initial results on Azure and HPC Clusters are presented
This document provides an overview of Bionimbus and the Open Cloud Consortium (OCC). Bionimbus is an open source cloud for biomedical research that provides services like elastic computing, databases, data transport and analysis pipelines. The OCC operates open clouds and develops standards to bridge private and public clouds. It runs an Open Cloud Testbed and is working to build an Open Science Data Cloud. The OCC aims to develop interoperable cloud architectures and operate infrastructure at data center scale to support open science.
The Open Science Data Cloud is a hosted, managed, distributed facility that allows scientists to manage and archive medium and large datasets, provide computational resources to analyze the data, and share the data with colleagues and the public. It currently consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage across 4 locations with 10G networks. Projects using the Open Science Data Cloud include Bionimbus for hosting genomics data and Matsu 2 for providing flood data to disaster response teams. The goal is to build it out over the next 10 years into a small data center for science that can preserve data like libraries and museums preserve collections.
Research results in peer-reviewed publications are reproducible, right? If only it was so clear cut. With high profile paper retractions and pushes for better data sharing by funders, publishers and the community, the spotlight is now focussing on the whole way research is conducted around the world.
This talk from the Software Sustainability Institute's Collaborations Workshop 2014 describes how cloud computing, with Microsoft Azure, is helping researchers realize the goals of scientific reproducibility.
Find out more at www.azure4research.com
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
This is a status update (as of Feb 22, 2010) of a new Open Cloud Consortium project that will provide on-demand, large scale image processing to assist with disaster relief efforts.
These are the slides from a plenary panel that I participated in at IEEE Cloud 2011 on July 5, 2011 in Washington, D.C. I discussed the Open Science Data Cloud and concluded the talk by three research questions
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
Ogres classify Big Data applications by multiple facets – each with several exemplars and features. This gives a
guide to breadth and depth of Big Data and allows one to examine which ogres a particular architecture/software support.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
Director's Colloquium at Los Alamos National Laboratory, September 18, 2014.
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. In this talk, I explore the past, current, and potential future of large-scale outsourcing and automation for science.
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
The document summarizes Sector, an open-source large data cloud computing platform, and compares it to Hadoop. Sector uses a file-based storage system instead of Hadoop's block-based HDFS, and features a more flexible UDF programming model compared to MapReduce. Benchmark results show Sector outperforming Hadoop on the Terasort and MalStone benchmarks, with speedups of up to 19x, due to its dataflow balancing, UDP-based transport, and other architectural advantages over Hadoop for data-intensive computing at scale. Lessons learned include the importance of data locality, load balancing, and fault tolerance in large-scale systems.
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...Geoffrey Fox
Advances in high-performance/parallel computing in the 1980's and 90's was spurred by the development of quality high-performance libraries, e.g., SCALAPACK, as well as by well-established benchmarks, such as Linpack.
Similar efforts to develop libraries for high-performance data analytics are underway. In this talk we motivate that such benchmarks should be motivated by frequent patterns encountered in high-performance analytics, which we call Ogres.
Based upon earlier work, we propose that doing so will enable adequate coverage of the "Apache" bigdata stack as well as most common application requirements, whilst building upon parallel computing experience.
Given the spectrum of analytic requirements and applications, there are multiple "facets" that need to be covered, and thus we propose an initial set of benchmarks - by no means currently complete - that covers these characteristics.
We hope this will encourage debate
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
This document discusses computing challenges posed by rapidly increasing data scales in scientific applications and high performance computing. It introduces the concept of online data analysis and reduction as an alternative to traditional offline analysis to help address these challenges. The key messages are that dramatic changes in HPC system geography due to different growth rates of technologies are driving new application structures and computational logistics problems, presenting exciting new computer science opportunities in online data analysis and reduction.
Presentation by Steffen Zeuch, Researcher at German Research Center for Artificial Intelligence (DFKI) and Post-Doc at TU Berlin (Germany), at the FogGuru Boot Camp training in September 2018.
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive even though commercially clouds devote many more resources to data analytics than supercomputers devote to simulations.
Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on the Apache software stack that is well used in modern cloud computing.
We give some examples including clustering, deep-learning and multi-dimensional scaling.
One suggestion from this work is value of a high performance Java (Grande) runtime that supports simulations and big data
Accelerating Data-driven Discovery in Energy ScienceIan Foster
A talk given at the US Department of Energy, covering our work on research data management and analysis. Three themes:
(1) Eliminate data friction (use of SaaS for research data management)
(2) Liberate scientific data (research on data extraction, organization, publication)
(3) Create discovery engines at DOE facilities (services that organize data + computation)
My talk at the Winter School on Big Data in Tarragona, Spain.
Abstract: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers.
The document discusses using artificial intelligence (AI) to accelerate materials innovation for clean energy applications. It outlines six elements needed for a Materials Acceleration Platform: 1) automated experimentation, 2) AI for materials discovery, 3) modular robotics for synthesis and characterization, 4) computational methods for inverse design, 5) bridging simulation length and time scales, and 6) data infrastructure. Examples of opportunities include using AI to bridge simulation scales, assist complex measurements, and enable automated materials design. The document argues that a cohesive infrastructure is needed to make effective use of AI, data, computation, and experiments for materials science.
This document describes two case studies of health and status monitoring systems used to monitor large, complex datasets and detect anomalies. In the first case study, a system monitored thousands of servers in a data center and detected dead or slow nodes that reduced overall performance. The second case study monitored billions of payment card transactions and developed over 15,500 statistical models to detect data quality issues and interoperability problems, improving approval rates and saving millions. Both cases highlighted the importance of executive support, dashboards, governance programs, and developing numerous statistical models tailored to different data segments.
Comparing Big Data and Simulation Applications and Implications for Software ...Geoffrey Fox
At eScience in the Cloud 2014, Redmond WA, April 30 2014
There is perhaps a broad consensus as to important issues in practical parallel computing as applied to large scale simulations; this is reflected in supercomputer architectures, algorithms, libraries, languages, compilers and best practice for application development.
However the same is not so true for data intensive, even though commercially clouds devote much more resources to data analytics than supercomputers devote to simulations.
We look at a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures.
We suggest a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks.
Our analysis builds on combining HPC and the Apache software stack that is well used in modern cloud computing.
Initial results on Azure and HPC Clusters are presented
This document provides an overview of Bionimbus and the Open Cloud Consortium (OCC). Bionimbus is an open source cloud for biomedical research that provides services like elastic computing, databases, data transport and analysis pipelines. The OCC operates open clouds and develops standards to bridge private and public clouds. It runs an Open Cloud Testbed and is working to build an Open Science Data Cloud. The OCC aims to develop interoperable cloud architectures and operate infrastructure at data center scale to support open science.
The Open Science Data Cloud is a hosted, managed, distributed facility that allows scientists to manage and archive medium and large datasets, provide computational resources to analyze the data, and share the data with colleagues and the public. It currently consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage across 4 locations with 10G networks. Projects using the Open Science Data Cloud include Bionimbus for hosting genomics data and Matsu 2 for providing flood data to disaster response teams. The goal is to build it out over the next 10 years into a small data center for science that can preserve data like libraries and museums preserve collections.
Research results in peer-reviewed publications are reproducible, right? If only it was so clear cut. With high profile paper retractions and pushes for better data sharing by funders, publishers and the community, the spotlight is now focussing on the whole way research is conducted around the world.
This talk from the Software Sustainability Institute's Collaborations Workshop 2014 describes how cloud computing, with Microsoft Azure, is helping researchers realize the goals of scientific reproducibility.
Find out more at www.azure4research.com
Large Scale On-Demand Image Processing For Disaster ReliefRobert Grossman
This is a status update (as of Feb 22, 2010) of a new Open Cloud Consortium project that will provide on-demand, large scale image processing to assist with disaster relief efforts.
These are the slides from a plenary panel that I participated in at IEEE Cloud 2011 on July 5, 2011 in Washington, D.C. I discussed the Open Science Data Cloud and concluded the talk by three research questions
Classification of Big Data Use Cases by different FacetsGeoffrey Fox
Ogres classify Big Data applications by multiple facets – each with several exemplars and features. This gives a
guide to breadth and depth of Big Data and allows one to examine which ogres a particular architecture/software support.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
Director's Colloquium at Los Alamos National Laboratory, September 18, 2014.
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. In this talk, I explore the past, current, and potential future of large-scale outsourcing and automation for science.
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
The document summarizes Sector, an open-source large data cloud computing platform, and compares it to Hadoop. Sector uses a file-based storage system instead of Hadoop's block-based HDFS, and features a more flexible UDF programming model compared to MapReduce. Benchmark results show Sector outperforming Hadoop on the Terasort and MalStone benchmarks, with speedups of up to 19x, due to its dataflow balancing, UDP-based transport, and other architectural advantages over Hadoop for data-intensive computing at scale. Lessons learned include the importance of data locality, load balancing, and fault tolerance in large-scale systems.
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...Geoffrey Fox
Advances in high-performance/parallel computing in the 1980's and 90's was spurred by the development of quality high-performance libraries, e.g., SCALAPACK, as well as by well-established benchmarks, such as Linpack.
Similar efforts to develop libraries for high-performance data analytics are underway. In this talk we motivate that such benchmarks should be motivated by frequent patterns encountered in high-performance analytics, which we call Ogres.
Based upon earlier work, we propose that doing so will enable adequate coverage of the "Apache" bigdata stack as well as most common application requirements, whilst building upon parallel computing experience.
Given the spectrum of analytic requirements and applications, there are multiple "facets" that need to be covered, and thus we propose an initial set of benchmarks - by no means currently complete - that covers these characteristics.
We hope this will encourage debate
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
This document discusses computing challenges posed by rapidly increasing data scales in scientific applications and high performance computing. It introduces the concept of online data analysis and reduction as an alternative to traditional offline analysis to help address these challenges. The key messages are that dramatic changes in HPC system geography due to different growth rates of technologies are driving new application structures and computational logistics problems, presenting exciting new computer science opportunities in online data analysis and reduction.
Presentation by Steffen Zeuch, Researcher at German Research Center for Artificial Intelligence (DFKI) and Post-Doc at TU Berlin (Germany), at the FogGuru Boot Camp training in September 2018.
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
Data Centric Approach: Our platform is built on the premise of absorbing data from multiple data sources and transforming them to a highly intelligent social network graphs that can be processed to non-obvious relationships.
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
Time to Science/Time to Results: Transforming Research in the CloudAmazon Web Services
This session demonstrates how cloud can accelerate breakthroughs in scientific research by providing on-demand access to powerful computing. You will gain insight into how scientific researchers are using the cloud to solve complex science, engineering, and business problems that require high bandwidth, low latency networking and very high compute capabilities. You will hear how leveraging the cloud reduces the costs and time to conduct large scale, worldwide collaborative research. Researchers can then access computational power, data storage, and supercomputing resources, and data sharing capabilities in a cost-efficient manner without implementation delays. Disease research can be accomplished in a fraction of the time, and innovative researchers in small schools or distant corners of the world have access to the same computing power as those at major research institutions by leveraging Amazon EC2, Amazon S3, optimizing C3 instances and more to increase collaboration. This session will provide best practices and insight from UC Berkeley AMP Lab on the services used to connect disparate sets of data to drive meaningful new insight and impact.
This document discusses developing analytics applications using machine learning on Azure Databricks and Apache Spark. It begins with an introduction to Richard Garris and the agenda. It then covers the data science lifecycle including data ingestion, understanding, modeling, and integrating models into applications. Finally, it demonstrates end-to-end examples of predicting power output, scoring leads, and predicting ratings from reviews.
Best pratices at BGI for the Challenges in the Era of Big Genomics DataXing Xu
BGI is the world's largest genome sequencing center, with over 150 sequencers and a sequencing throughput of 6 TB per day. It also has the largest computing and storage center for genomics in China, with over 20,000 CPU cores, 19 GPUs, 220+ teraflops of peak performance, and 17 petabytes of data storage. BGI faces challenges from the exponential growth of genomic data, complex data analysis processes, and widely distributed data. It addresses these challenges through solutions like high-speed data transfer, cloud computing platforms like EasyGenomics, and distributed algorithms and infrastructure using Hadoop and GPU acceleration.
A talk at the RPI-NSF Workshop on Multiscale Modeling of Complex Data, September 12, 2011, Troy NY, USA.
We have made much progress over the past decade toward effectively
harnessing the collective power of IT resources distributed across the
globe. In fields such as high-energy physics, astronomy, and climate,
thousands benefit daily from tools that manage and analyze large
quantities of data produced and consumed by large collaborative teams.
But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that far more--ultimately
most?--researchers will soon require capabilities not so different from those used by these big-science teams. How is the general population of researchers and institutions to meet these needs? Must every lab be filled
with computers loaded with sophisticated software, and every researcher become an information technology (IT) specialist? Can we possibly afford to equip our labs in this way, and where would we find the experts to operate them?
Consumers and businesses face similar challenges, and industry has
responded by moving IT out of homes and offices to so-called cloud providers (e.g., GMail, Google Docs, Salesforce), slashing costs and complexity. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity. More importantly, we can free researchers from the burden of managing IT, giving them back their time to focus on research and empowering them to go beyond the scope of what was previously possible.
I describe work we are doing at the Computation Institute to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date and suggest a path towards
large-scale delivery of these capabilities.
Introduction to Machine Learning with H2O and PythonJo-fai Chow
This document provides an introduction and overview of machine learning with H2O and Python. It begins with background information about the presenter, Joe Chow, including his work experience and side projects. The agenda then outlines topics to be covered, including an introduction to H2O.ai the company and machine learning platform, followed by a Python tutorial and examples. The tutorial will cover importing and manipulating data, basic and advanced regression and classification models, and using H2O in the cloud.
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...inside-BigData.com
In this Deck from the 2018 Swiss HPC Conference, Dave Turek from IBM presents: The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big Data.
"There is a shift underway where HPC is beginning to be addressed with novel techniques and technologies including cognitive and analytic approaches to HPC problems and the arrival of the first quantum systems. This talk will showcase how IBM is merging cognitive, analytics, and quantum with classic simulation and modeling to create a new path for computational science."
Watch the video: https://wp.me/p3RLHQ-ik7
Learn more: http://ibm.com
and
http://www.hpcadvisorycouncil.com/events/2018/swiss-workshop/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
As the complexity in AI and Machine Learning processes increases, robust data pipelines need to be
developed for industrial scale model development and deployment. . In regulated industries such as
Finance, Healthcare etc. where automated decision making is increasingly becoming used, tracking
design of experiments and from inception to deployment is critical to ensure a robust process is
adopted. Model Life-cycle management solutions are proposed to track experiments, design robust
experiments for hyper parameter tuning, optimization and selection of models and for monitoring.
The number of choices and the parameters that need to be tracked makes is significantly
challenging to trace experiments and to address reproducibility concerns.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model
changes primarily for AI and ML models. In addition, we discuss how change analytics can be used
for process improvement and to enhance the model development and deployment processes.
This document provides an agenda for a presentation on data analytics. It includes an introduction to data analytics concepts and examples of applications in various industries. A demo of Instant Insights, a cloud-based analytics tool, is presented. The document emphasizes that building analytics solutions through testing and iteration is better than just discussing ideas. Users should test solutions in the real world and measure user behavior to make data-driven decisions.
Manage Data with Assurance
- Globus is a data transfer service that aims to increase the efficiency of researchers by enabling sustainable data sharing.
- It has over 138,000 registered users and has transferred over 600 petabytes of data between endpoints.
- The document discusses recent enhancements to Globus including new connectors, improved security features for regulated data, and plans to further develop the platform capabilities.
Linking Scientific Instruments and ComputationIan Foster
[Talk presented at Monterey Data Conference, August 31, 2022]
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines—what we call flows—that link instruments, computers (e.g., for analysis, simulation, AI model training), edge computing (e.g., for analysis), data stores, metadata catalogs, and high-speed networks. We review common patterns associated with such flows and describe methods for instantiating these patterns. We present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages powerful computers for data inversion, machine learning model training, or other purposes. We also discuss implications of such methods for operators and users of scientific facilities.
(1) Amundsen is a data discovery platform developed by Lyft to help users find, understand, and use data.
(2) The platform addresses challenges around data discovery such as lack of understanding about what data exists and where to find it.
(3) Amundsen provides searchable metadata about data resources, previews of data, and usage statistics to help data scientists and others explore and understand data.
The title of this talk is a crass attempt to be catchy and topical, by referring to the recent victory of Watson in Jeopardy.
My point (perhaps confusingly) is not that new computer capabilities are a bad thing. On the contrary, these capabilities represent a tremendous opportunity for science. The challenge that I speak to is how we leverage these capabilities without computers and computation overwhelming the research community in terms of both human and financial resources. The solution, I suggest, is to get computation out of the lab—to outsource it to third party providers.
Abstract follows:
We have made much progress over the past decade toward effective distributed cyberinfrastructure. In big-science fields such as high energy physics, astronomy, and climate, thousands benefit daily from tools that enable the distributed management and analysis of vast quantities of data. But we now face a far greater challenge. Exploding data volumes and new research methodologies mean that many more--ultimately most?--researchers will soon require similar capabilities. How can we possible supply information technology (IT) at this scale, given constrained budgets? Must every lab become filled with computers, and every researcher an IT specialist?
I propose that the answer is to take a leaf from industry, which is slashing both the costs and complexity of consumer and business IT by moving it out of homes and offices to so-called cloud providers. I suggest that by similarly moving research IT out of the lab, we can realize comparable economies of scale and reductions in complexity, empowering investigators with new capabilities and freeing them to focus on their research.
I describe work we are doing to realize this approach, focusing initially on research data lifecycle management. I present promising results obtained to date, and suggest a path towards large-scale delivery of these capabilities. I also suggest that these developments are part of a larger "revolution in scientific affairs," as profound in its implications as the much-discussed "revolution in military affairs" resulting from more capable, low-cost IT. I conclude with some thoughts on how researchers, educators, and institutions may want to prepare for this revolution.
Get Started with Data Science by Analyzing Traffic Data from California HighwaysAerospike, Inc.
This document summarizes an effort to analyze traffic data from California highways to better understand data science techniques. The researchers searched for an open dataset, eventually finding sensor data from California highways. They analyzed the data format and values to understand it. To detect traffic incidents, they framed it as a classification problem and prepared training data by labeling sensor records near incidents as positive examples. They trained classifiers on this data but initial results were poor. After refining the features and balancing the training data, the classifiers showed more promising results.
eScience: A Transformed Scientific MethodDuncan Hull
The document discusses the concept of eScience, which involves synthesizing information technology and science. It explains how science is becoming more data-driven and computational, requiring new tools to manage large amounts of data. It recommends that organizations foster the development of tools to help with data capture, analysis, publication, and access across various scientific disciplines.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Similar to Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales (20)
Global Services for Global Science March 2023.pptxIan Foster
We are on the verge of a global communications revolution based on ubiquitous high-speed 5G, 6G, and free-space optics technologies. The resulting global communications fabric can enable new ultra-collaborative research modalities that pool sensors, data, and computation with unprecedented flexibility and focus. But realizing these modalities requires new services to overcome the tremendous friction currently associated with any actions that traverse institutional boundaries. The solution, I argue, is new global science services to mediate between user intent and infrastructure realities. I describe our experiences building and operating such services and the principles that we have identified as needed for successful deployment and operations.
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
The Earth System Grid Federation (ESGF) is a distributed network of climate data servers that archives and shares model output data used by scientists worldwide. ESGF has led data archiving for the Coupled Model Intercomparison Project (CMIP) since its inception. The ESGF Holdings have grown significantly from CMIP5 to CMIP6 and are expected to continue growing rapidly. A new ESGF2 project funded by the US Department of Energy aims to modernize ESGF to handle exabyte scale data volumes through a new architecture based on centralized Globus services, improved data discovery tools, and data proximate computing capabilities.
Better Information Faster: Programming the ContinuumIan Foster
This document discusses the computing continuum and efforts to enable better information faster through computation. It provides examples of how techniques like executing tasks closer to data sources or on specialized hardware can significantly accelerate applications. Programming models and managed services are explored for specifying and executing workloads across diverse infrastructure. There are still open questions around optimizing networks, algorithms, and applications for the computing continuum.
ESnet6 provides an ultra-fast and reliable network that enables new smart instruments for 21st century science. The network capacity has increased dramatically over time, with 2022 bandwidth being 500,000 times greater than 1993. This network allows rapid data transfer between facilities, such as replicating 7 petabytes of climate data between three labs. It also enables fast assembly and use of new instruments like high energy diffraction microscopy that can perform an analysis in 31 seconds. The integrated research infrastructure provided by Globus further supports use of remote resources and smart instruments that will drive scientific discovery.
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
Talk in the National Science Data Fabric (NSDF) Distinguished Speaker Series
The Globus team has spent more than a decade developing software-as-a-service methods for research data management, available at globus.org. Globus transfer, sharing, search, publication, identity and access management (IAM), automation, and other services enable reliable, secure, and efficient managed access to exabytes of scientific data on tens of thousands of storage systems. For developers, flexible and open platform APIs reduce greatly the cost of developing and operating customized data distribution, sharing, and analysis applications. With 200,000 registered users at more than 2,000 institutions, more than 1.5 exabytes and 100 billion files handled, and 100s of registered applications and services, the services that comprise the Globus platform have become essential infrastructure for many researchers, projects, and institutions. I describe the design of the Globus platform, present illustrative applications, and discuss lessons learned for cyberinfrastructure software architecture, dissemination, and sustainability.
Video is at https://www.youtube.com/watch?v=p8pCHkFFq1E
Daniel Lopresti, Bill Gropp, Mark D. Hill, Katie Schuman, and I put together a white paper on "Building a National Discovery Cloud" for the Computing Community Consortium (http://cra.org/ccc). I presented these slides at a Computing Research Association "Best Practices on using the Cloud for Computing Research Workshop" (https://cra.org/industry/events/cloudworkshop/).
Abstract from White Paper:
The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each development has major implications for how societies function and compete; together, they represent a change in technological foundations of society as profound as the telegraph or electrification. Societies that embrace these changes will lead in the 21st Century; those that do not, will decline in prosperity and influence. Nowhere is this stark choice more evident than in research and education, the two sectors that produce the innovations that power the future and prepare a workforce able to exploit those innovations, respectively. In this article, we introduce these developments and suggest steps that the US government might take to prepare the research and education system for its implications.
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
I presented to the Environmental Data Science group at UChicago, with the goal of getting them excited about the opportunities inherent in big data, big computing, and AI--and to think about how to collaborate with Argonne in those areas. We had a great and long conversation about Takuya Kurihana's work on unsupervised learning for cloud classification. I also mentioned our work making NASA and CMIP data accessible on AI supercomputers.
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
A talk at the CODATA/RDA meeting in Gaborone, Botswana. I made the case that the biggest barriers to effective data sharing and reuse are often those associated with "data friction" and that cloud automation can be used to overcome those barriers.
The image on the first slide shows a few of the more than 20,000 active Globus endpoints.
Research Automation for Data-Driven DiscoveryIan Foster
This document discusses research automation and data-driven discovery. It notes that data volumes are growing much faster than computational power, creating a productivity crisis in research. However, most labs have limited resources to handle these large data volumes. The document proposes applying lessons from industry to create cloud-based science services with standardized APIs that can automate and outsource common tasks like data transfer, sharing, publishing, and searching. This would help scientists focus on their core research instead of computational infrastructure. Examples of existing services from Argonne National Lab and the University of Chicago Globus project are provided. The goal is to establish robust, scalable, and persistent cloud platforms to help address the challenges of data-driven scientific discovery.
Scaling collaborative data science with Globus and JupyterIan Foster
The Globus service simplifies the utilization of large and distributed data on the Jupyter platform. Ian Foster explains how to use Globus and Jupyter to seamlessly access notebooks using existing institutional credentials, connect notebooks with data residing on disparate storage systems, and make data securely available to business partners and research collaborators.
Team Argon proposes a commons platform using reusable components to promote continuous FAIRness of data. These components include Globus Connect Server for standardized data access and transfer across storage systems, Globus Auth for authentication and authorization, and BDBags for exchange of query results and cohorts using a common manifest format. Together these aim to provide uniform, secure, and reliable access, transfer, and sharing of data while supporting identification, search, and virtualization of derived data products.
This document discusses lessons learned for achieving interoperability. It recommends having a clear purpose, starting with basic conventions like identifiers, monitoring commitments to build trust, and focusing on outward-facing interoperability through simple APIs and platforms rather than full software stacks. Observance of industry practices like authentication methods and cloud-based platforms is also advised to promote rapid development and distribution of applications.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Software Infrastructure for a National Research PlatformIan Foster
A presentation at the First National Research Platform workshop. "The purpose of this workshop is to bring together representatives from interested institutions to discuss implementation strategies for deployment of interoperable Science DMZs at a national scale." I present eight desirable properties for a software infrastructure for such a platforms, and describe our experience realizing these properties in the Globus system.
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Ian Foster
The Advanced Photon Source (APS) at Argonne National Laboratory produces intense beams of x-rays for scientific research. Experimental data from the APS is growing dramatically due to improved detectors and a planned upgrade. This is creating data and computation challenges across the entire experimental process. Efforts are underway to accelerate the experimental feedback loop through automated data analysis, optimized data streaming, and computer-steered experiments to minimize data collection. The goal is to enable real-time insights and knowledge-driven experiments.
Globus Auth: A Research Identity and Access Management PlatformIan Foster
Globus Auth is a foundational identity and access management platform service designed to address unique needs of the science and engineering community. It serves to broker authentication and authorization interactions between end-users, identity providers, resource servers (services), and clients (includ- ing web, mobile, desktop, and command line applications, and other services). Globus Auth thus makes it easy, for example, for a researcher to authenticate with one credential, connect to a specific remote storage resource with another identity, and share data with colleagues based on another identity. By eliminating friction associated with the frequent need for multiple accounts, identities, credentials, and groups when using distributed cyber- infrastructure, Globus Auth streamlines the creation, integration, and use of advanced research applications and services. Globus Auth builds upon the OAuth 2 and OpenID Connect specifications to enable standards-compliant integration using existing client libraries. It supports identity federation models that enable diverse identities to be linked together, while also providing delegated access tokens via which client services can obtain short term delegated tokens to access other services. We describe the design and implementation of Globus Auth, and report on experiences integrating it with a range of research resources and services, including the JetStream cloud, XSEDE, NCAR’s Research Data Archive, and FaceBase.
Streamlined data sharing and analysis to accelerate cancer researchIan Foster
Advances in genomics and data analytics create new opportunities for cancer research and personalized medical treatment via large-scale federation of genomic, clinical, imaging and other data from many thousands of patients across institutions around the world. Despite these opportunities and promising early results, cancer research is often stymied by information technology barriers. One major barrier is a lack of tools for the reliable, secure, rapid, and easy transfer, sharing, and management of large collections of human data. In the absence of such tools, security and performance concerns often prevent sharing altogether or force researchers to resort to slow and error prone shipping of physical media. If data are received, timely analysis is further impeded by the difficulties inherent in verifying data integrity and managing who can access data and for what purpose. I will discuss how the mature Globus data management platform addresses these obstacles to discovery and explain how its intuitive, web-based interfaces enable use by researchers without specialized IT knowledge. I also describe how Globus technologies can be extended to meet the security requirements of human data so as to enable use in data-intensive cancer research.
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
Ever more data- and compute-intensive science makes computing increasingly important for research. But for advanced computing infrastructure to benefit more than the scientific 1%, we need new delivery methods that slash access costs, new sustainability models beyond direct research funding, and new platform capabilities to accelerate the development of new, interoperable tools and services.
The Globus team has been working towards these goals since 2010. We have developed software-as-a-service methods that move complex and time-consuming research IT tasks out of the lab and into the cloud, thus greatly reducing the expertise and resources required to use them. We have demonstrated a subscription-based funding model that engages research institutions in supporting service operations. And we are now also showing how the platform services that underpin Globus applications can accelerate the development and use of an integrated ecosystem of advanced science applications, such as NCAR’s Research Data Archive and OSG Connect, thus enabling access to powerful data and compute resources by many more people than is possible today.
In this talk, I introduce Globus services and the underlying Globus platform. I present representative applications and discuss opportunities that this platform presents for both small science and large facilities.
building global software/earthcube->sciencecloudIan Foster
My lunchtime talk at the EarthCube all hands meeting. I made the case that we need to rethink how science software is developed and delivered, leveraging the software-as-a-service (SaaS) methods that have proved so successful in industry to reduce both costs and barriers to use. [The beautiful (IMHO) maps were created by me with Python matplotlib, showing the locations of (a subset of) Globus endpoints.]
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales
1. 1
Computing Just What You Need:
Online Data Analysis and Reduction
at Extreme Scales
Ian Foster
Argonne National Lab & University of Chicago
December 21, 2017
HiPC, Jaipur, India
https://www.researchgate.net/publication/317703782
foster@anl.gov
2. 2
Earth to be paradise; distance to lose enchantment
“If, as it is said to be not unlikely in
the near future, the principle of
sight is applied to the telephone
as well as that of sound, earth will
be in truth a paradise, and
distance will lose its enchantment
by being abolished altogether.”
— Arthur Mee, 1898
5. 5
Automating research data lifecycle
5
major
services
13
national labs
use Globus
340PB
transferred
10,000
active endpoints
50 Bn
files processed
75,000
registered users
99.5%
uptime
65+
institutional
subscribers
1 PB
largest single
transfer to date
3 months
longest continuously
managed transfer
300+
federated
campus
identities
12,000
active users/year
6. 6
Transferring 1PB in a day
Argonne → NCSA
• Cosmology simulation
on Mira @ Argonne
produces 1 PB in 24
hours
• Data streamed to Blue
Waters for analytics
• Application reveals
feasibility of real-time
streaming at scale
Without checksums
With checksums
15. 15
Exascale climate goal: Ensembles of 1km models at
15 simulated years/24 hours
Full state once per model day 260 TB every 16 seconds
1.4 EB/day
17. 17
Time to discovery
Simula on me
Ultra
scale
Data Space tools: Popula on, naviga on, manipula on, dissemina on
Leadership
class facility
Smaller systems
Leadership
class facility
Smaller systems
Ultra
scale
18. 18
Time to discovery
Simula on me
Ultra
scale
Data Space tools: Popula on, naviga on, manipula on, dissemina on
Leadership
class facility
Smaller systems
Leadership
class facility
Smaller systems
Ultra
scale
19. 19
Time to discovery
Simula on me
Ultra
scale
Data Space tools: Popula on, naviga on, manipula on, dissemina on
Leadership
class facility
Smaller systems
Leadership
class facility
Smaller systems
Ultra
scale
20. 20
Time to discovery
Simula on me
Ultra
scale
Data Space tools: Popula on, naviga on, manipula on, dissemina on
Leadership
class facility
Smaller systems
Leadership
class facility
Smaller systems
Ultra
scale
21. 21
The need for online data analysis and reduction
Traditional approach:
Simulate, output, analyze
Write simulation output to secondary
storage; read back for analysis
Decimate in time when simulation
output rate exceeds output rate of
computer
Online: y = F(x)
Offline: a = A(y), b= B(y), …
22. 22
The need for online data analysis and reduction
Traditional approach:
Simulate, output, analyze
Write simulation output to secondary
storage; read back for analysis
Decimate in time when simulation
output rate exceeds output rate of
computer
Online: y = F(x)
Offline: a = A(y), b= B(y), …
New approach:
Online data analysis & reduction
Co-optimize simulation, analysis,
reduction for performance and
information output
Substitute CPU cycles for I/O, via online
data (de)compression and/or analysis
a) Online: a = A(F(x)), b = B(F(x)), …
b) Online: r = R(F(x))
Offline: a = A’(r), b = B’(r), or
a = A(U(r)), b = B(U(r))
[R = reduce, U = un-reduce]
23. 23
Exascale computing at Argonne by 2021
Precision medicine
Data from
sensors and
scientific
instruments
Simulation and
modeling of
materials and
physical
systems
Support for three types of computing:
Traditional: HPC simulation and modeling
Learning: Machine learning, deep learning, AI
Data: Data analytics, data science, big data
[Artists impression]
24. 25
Real-time analysis and experimental steering
• Current protocols process
and validate data only after
experiment, which can lead
to undetected errors and
prevents online steering
• Process data streamed
from beamline to
supercomputer; control
feedback loop makes
decisions during
experiment
• Tests in TXM beamline (32-
ID@APS) in cement wetting
experiment (2 experiments,
each with 8 hours of data
acquisition time)
Sustained # Projections/seconds
CircularBufferSize
Reconstruction Frequency
Image Quality w.r.t. Streamed Projections
SimilarityScore
# Streamed Projections Reconstructed
Image Sequence
Tekin Bicer et al., eScience 2017
25. 26
Deep learning for precision medicine
https://de.mathworks.com/company/newsletters/articles/cancer-diagnostics-with-deep-learning-and-photonic-time-stretch.html
28. Simulation data Learning methods New capabilities
New simulations
Using learning to optimize simulation studies
Logan Ward and Ben Blaiszik
29. 30
Synopsis: Applications are changing
Single
program
Multiple
program
Offline
analysis
Online
analysis
A few or many tasks:
• Loosely or tightly coupled
• Hierarchical or not
• Static or dynamic
• Fail-stop or recoverable
• Shared state
• Persistent and transient state
• Scheduled or data driven
Multiple
simulations
+ analyses
Simulation
+ analysis
Multiple
simulations
30. 31
Many interesting codesign problems
Big simulation
Machine learning
Deep learning
Streaming
Online analysis
Online reduction
Heterogeneity
Prog. models
- Many task
- Streaming
Libraries
- Analysis, reduction
- Communications
System software
- Fault tolerance
- Resource mgmt
Complex nodes
- Many core
- Accelerators
- Heterogeneous
NVRAM
Networks
- Internal
- External
Node configuration
Internal networks
External networks
Memory hierarchy
Storage systems
Heterogeneity
Operating policies
31. 32
Reduction comes with challenges
• Handling high entropy
• Performance – no benefit
otherwise
• Not only error in variable:
Ε ≡ 𝑓 − 𝑓
• Must also consider impact
on derived quantities:
Ε ≡ (𝑔𝑙
𝑡
(𝑓 𝑥, 𝑡 ) − 𝑔𝑙
𝑡
( 𝑓𝑙
𝑡
( 𝑥, 𝑡 )
S. Klasky
32. 33
Key research challenge:
How to manage the impact
of errors on derived
quantities?
Where did it go???
S. Klasky
Reduction comes with challenges
33. 34
CODAR: Codesign center for Online Data
Analysis and Reduction
• Infrastructure development and deployment
• Enable rapid composition of application and “data services” (data
reduction methods, data analysis methods, etc.)
• Support CODAR-developed and other data services
• Method development: new reduction & analysis routines
• Motif-specific: e.g., finite difference mesh vs. particles vs. finite elements
• Application-specific: e.g., reduced physics to understand deltas
• Application engagement
• Understand data analysis and reduction requirements
• Integrate, deploy, evaluate impact
https://codarcode.github.io codar-info@cels.anl.gov
34. 35
Cross-cutting research questions
What are the best data analysis and reduction algorithms for
different application classes, in terms of speed, accuracy, and
resource needs? How can we implement those algorithms to
achieve scalability and performance portability?
What are tradeoffs in analysis accuracy, resource needs,
and overall application performance between using various
data reduction methods online prior to offline data reconstruction
and analysis vs. performing more online data analysis? How do
tradeoffs vary with hardware & software choices?
How do we effectively orchestrate online data analysis and
reduction to reduce associated overheads? How can hardware
and software help with orchestration?
35. 36
Prototypical data analysis and reduction pipeline
CODAR runtime
Reduced output and reconstruction info
I/O
system
CODAR data API
Running simulation
Multivariate statistics
Feature analysis
Outlier detection
Application-aware
Transforms
Encodings
Error calculation
Refinement hints
CODARdataAPI
Offlinedataanalysis
Simulation knowledge: application, models, numerics, performance optimization, …
CODAR
data analysis
CODAR
data reduction
CODAR
data monitoring
36. 37
Overarching data reduction challenges
• Understanding the science requires massive data reduction
• How do we reduce
• The time spent in reducing the data to knowledge?
• The amount of data moved on the HPC platform?
• The amount of data read from the storage system?
• The amount of data stored in memory, on storage system, moved over WAN?
• Without removing the knowledge.
• Requires deep dives into application post processing routines and simulations
• Goal is to create both (a) co-design infrastructure and (b)
reduction and analysis routines
• General: e.g., reduce Nbytes to Mbytes, N<<M
• Motif-specific: e.g., finite difference mesh vs. particles vs. finite elements
• Application-specific: e.g. reduced physics allows us to understand deltas
37. 38
HPC floating point compression
• Current interest is with lossy algorithms, some use preprocessing
• Lossless may achieve up to ~3x reduction
• ISABELA
• SZ
• ZFP
• Linear auditing
• SVD
• Adaptive gradient methods
Compress each variable separately: Several variables simultaneously:
• PCA
• Tensor decomposition
• …
38. 39
Lossy compression with SZ
No existing compressor can reduce hard to compress
datasets by more than a factor of 2.
Objective 1: Reduce hard to compress datasets by
one order of magnitude
Objective 2: Add user-required error controls (error
bound, shape of error distribution, spectral behavior
of error function, etc. etc.)
NCAR
atmosphere
simulation
output
(1.5 TB)
WRF
hurricane
simulation
output
Advanced
Photon Source
mouse brain
data
What we need to
compress
(bit map of 128
floating point
numbers):
Random noise
Franck Cappello
41. 42
Z-checker: Analysis of data reduction error
• Community tool to enable comprehensive assessment of lossy data reduction error:
• Collection of data quality criteria from applications
• Community repository for datasets, reduction quality requirements, compression
performance
• Modular design enables contributed analysis modules (C and R) and format
readers (ADIOS, HDF5, etc.)
• Off-line/on-line parallel statistical, spectral, point-wise distortion analysis with static
& dynamic visualization
Franck Cappello, Julie Bessac, Sheng Di
42. 43
Z-Checker computations
• Normalized root mean squared error
• Peak signal to noise ratio
• Distribution of error
• Pearson correlation between raw and reduced datasets
• Power spectrum distortion
• Auto-correlation of compression error
• Maximum error
• Point-wise error bound (relative or absolute)
• Preservation of derivatives
• Structural similarity (SSIM) index
43. 44
Science-driven optimizations
• Information-theoretically derived methods like SZ,
Isabella, ZFP make for good generic capabilities
• If scientists can provide additional details on how to
determine features of interest, we can use those to
drive further optimizations. E.g., if they can select:
• Regions of high gradient
• Regions near turbulent flow
• Particles with velocities > two standard deviations
• How can scientists help define features?
44. 45
Multilevel compression techniques
A hierarchical reduction scheme produces
multiple levels of partial decompression of the
data so that users can work with reduced
representations that require minimal storage
whilst achieving user-specified tolerance
Compression vs. user-specified toleranceResults for turbulence dataset: extremely large,
inherently non-smooth, resistant to compression Mark Ainsworth
45. 46
Manifold learning for change detection and
adaptive sampling
Low dimensional manifold projection
of different state of MD trajectories
• A single molecular dynamics
trajectory can generate 32 PB
• Use online data analysis to detect
relevant or significant events
• Project MD trajectories to manifold
space (dimensionality reduction) across
time into two dimensional space
• Change detection on manifold space is
more robust than original full coordinate
space as it removes local vibrational
noise
• Apply adaptive sampling strategy based
on accumulated changes of trajectories
Shinjae Yoo
46. 47
Critical points extracted
with topology analysis
Tracking blobs in XGC fusion simulations
Blobs, regions of high turbulence that can
damage the Tokamak, can run along the
edge wall down toward the diverter and
damage it. Blob extraction and tracking
enables the exploration and analysis of
high-energy blobs across timesteps.
• Access data with ADIOS I/O in high
performance
• Precondition the input data with robust PCA
• Detect blobs as local extrema with topology
analysis
• Track blobs over time with combinatorial
feature flow field method
Extracting, tracking, and visualizing blobs in large 5D gyrokinetic Tokamak simulations
Hanqi Guo, Tom Peterka
Tracking graph that visualizes the dynamics of blobs
(birth, merge, split, and death) over time
Data preconditioning
with robust PCA
47. 48
Reduction for visualization
“an extreme scale simulation … calculates
temperature and density over 1000 of time
steps. For both variables, a scientist would like
to visualize 10 isosurface values and X, Y, and Z
cut planes for 10 locations in each dimension.
One hundred different camera positions are
also selected, in a hemisphere above the
dataset pointing towards the data set. We will
run the in situ image acquisition for every time
step. These parameters will produce: 2
variables x 1000 time steps x (10 isosurface
values + 3 x 10 cut planes) x 100 camera
positions x 3 images (depth, float, lighting)
= 2.4 x 107 images.”
J. Ahrens et al., SC’14
103 time steps x
1015 B state per
time step = 1018 B
2.4 x 107 images x
1MB/image
(megapixel, 4B) =
2.4 x 1012 B
48. 49
Fusion whole device model
XGC GENEInterpolator
100+ PB
PB/day on
Titan today;
10+ PB/day
in the future
10 TB/day on
Titan today;
100+ TB/day
in the future
Analysis
Analysis
Analysis
Read 10-100 PB
per analysis
http://bit.ly/2fcyznK
50. 51
Integrates multiple technologies:
•ADIOS staging (DataSpaces) for coupling
•Sirius (ADIOS + Ceph) for storage
•ZFP, SZ, Dogstar for reduction
•VTK-M services for visualization
•TAU for instrumenting the code
•Cheetah + Savanna to test the different
configurations (same node, different node,
hybrid-combination) to determine where to
place the different services
•Flexpath for staged-write from XGC to storage
•Ceph + ADIOS to manage storage hierarchy
•Swift for workflow automation
XGC GENEInterpolator
Reduction Reduction
XGC
Viz.
XGC
output
GENE
Viz.
GENE
output
TAU TAU
Comparative
Viz.
NVRAM
PFS
TAPE
Performance
Viz.
Cheetah +
Savanna drive
codesign experiments
Fusion whole device model
51. 52
Codesign questions to be addressed
• How can we couple multiple codes? Files, staging on the same
node, different nodes, synchronous, asynchronous?
• How we can test different placement strategies for memory
optimization, performance optimizations?
• What are the best reduction technologies to allow us to capture
all relevant information during a simulation? E.g., Performance
vs. accuracy.
• How can we create visualization services that work on the
different architectures and use the data models in the codes?
• How do we manage data across storage hierarchies?
52. 53
Savannah: Swift workflows coupled with ADIOS
Z-Check
dup
Multi-node workflow components communicate over ADIOS
Application data
Cheetah
Experiment
configuration
and dispatch
User monitoring and
control of multiple
pipeline instances
Co-design data
Store
experiment
metadata
Chimbuko
captures co-design
performance data
Other co-design
output
(e.g., Z-Checker)
CODAR
campaign
definition
Analysis
ADIOS output
Job launch
Science
App
Reduce
Co-design experiment architecture
53. 54
Tasks demands new systems capabilities
Single
program
Multiple
program
Offline
analysis
Online
analysis
A few or many tasks:
• Loosely or tightly coupled
• Hierarchical or not
• Static or dynamic
• Fail-stop or recoverable
• Shared state
• Persistent and transient state
• Scheduled or data driven
Multiple
simulations
+ analyses
Simulation
+ analysis
Multiple
simulations
54. 55
Challenge: Enable isolation, fault
tolerance, and composability for
ensembles of scientific
simulation/analysis pipelines
Defined MPIX_Comm_launch() call to enable
vendors to support dynamic workflow
pipelines, in which parallel applications of
various sizes are coupled in complex ways.
Key use case: ADIOS-based in situ analysis.
Integrated this feature with Swift/T, a scalable,
MPI-based workflow system. Allows ease of
development when coupling existing codes.
Working to have this mode of operation
supported in Cray OS.
Codesign of MPI interfaces in support of HPC workflows
Depiction of workflow of simulation analysis pipelines. Clusters of
boxes are MPI programs passing output data downstream. An
algorithm such as parameter optimization controls progress. Our
launch feature was scaled to 192 nodes with a challenging workload
for performance analysis, and the feature is in use by the CODES
network simulation team for its resilience capabilities.
Dorier, Wozniak, and Ross. Supporting Task-level Fault-tolerance in HPC Workflows by Launching MPI Jobs inside MPI Jobs. WORKS @ SC, 2017.
55. 56Justin Wozniak and Jonathan Ozik
EMEWS: Extreme-scale
Model Exploration
With Swift
Many ways to extend:
- Hyperband
Li et al., arXiv:1603.0656
- Population-based training
Jagerberg, arXiv:1711.09846
EMEWS hyperparameter optimization
56. 57
Co-evolution of HPC applications and systems …
… demand new application, software, and hardware …
… resulting in exciting new computer science challenges
foster@anl.gov
Thanks to US Department of Energy and CODAR team