I presented this keynote talk at the WorldComp conference in Las Vegas, on July 13, 2009. In it, I summarize what grid is about (focusing in particular on the "integration" function, rather than the "outsourcing" function--what people call "cloud" today), using biomedical examples in particular.
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Giri Prakash from the ARM Data Center at Oak Ridge National Laboratory.
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
Major research instruments are generating orders of magnitude more data in relatively short timeframes. As a result, the research enterprise is increasingly challenged by what should be mundane tasks: describing data for discovery and making data securely accessible to the broader research community. The ad hoc methods currently employed place undue burden on scientists and system administrators alike, and it is clear that a more robust, scalable approach is required.
Bespoke data portals (and science gateways/data commons) are becoming more prominent as a means of enabling access to large datasets. in this tutorial we demonstrate how services for authentication, authorization, metadata management, and search may be integrated with popular web frameworks, and used in combination with fast, well-architected networks to make data discoverable and accessible. Outcomes: build a simple, but functional, data portal that facilitates flexible data description, faceted data search and secure data access.
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Ben Blaiszik from University of Chicago and Argonne National Laboratory Data Science and Learning Division.
We presented these slides at the NIH Data Commons kickoff meeting, showing some of the technologies that we propose to integrate in our "full stack" pilot.
Keynote presentation at GlobusWorld 2021. Highlights product updates and roadmap, as well as user success stories in research data management. Presented by Ian Foster, Rachana Ananthakrishnan, Kyle Chard and Vas Vasiliadis.
Recent Upgrades to ARM Data Transfer and Delivery Using GlobusGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Giri Prakash from the ARM Data Center at Oak Ridge National Laboratory.
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
Major research instruments are generating orders of magnitude more data in relatively short timeframes. As a result, the research enterprise is increasingly challenged by what should be mundane tasks: describing data for discovery and making data securely accessible to the broader research community. The ad hoc methods currently employed place undue burden on scientists and system administrators alike, and it is clear that a more robust, scalable approach is required.
Bespoke data portals (and science gateways/data commons) are becoming more prominent as a means of enabling access to large datasets. in this tutorial we demonstrate how services for authentication, authorization, metadata management, and search may be integrated with popular web frameworks, and used in combination with fast, well-architected networks to make data discoverable and accessible. Outcomes: build a simple, but functional, data portal that facilitates flexible data description, faceted data search and secure data access.
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Ben Blaiszik from University of Chicago and Argonne National Laboratory Data Science and Learning Division.
We presented these slides at the NIH Data Commons kickoff meeting, showing some of the technologies that we propose to integrate in our "full stack" pilot.
Keynote presentation at GlobusWorld 2021. Highlights product updates and roadmap, as well as user success stories in research data management. Presented by Ian Foster, Rachana Ananthakrishnan, Kyle Chard and Vas Vasiliadis.
20160922 Materials Data Facility TMS WebinarBen Blaiszik
Fall 2016 TMS Webinar on Data Curation Tools. Slides for the Materials Data Facility presentation on data services (publish and discover) as described by Ben Blaiszik. See http://www.materialsdatafacility.org for more information.
Screenshots prepared by Ben Blaiszik and Kyle Chard, used in our Globus publication demo at GlobusWorld 2014. See https://www.globus.org/data-publication for more information and the notes on the slides for details.
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
We describe the requirements for, and challenges of, distributing datasets at scale, e.g. from instruments such as CryoEM and advanced light sources. We demonstrate a web application that uses Globus to perform large-scale data distribution. We introduce and walk through a Jupyter notebook highlighting the relevant code to incorporate into a science gateway.
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGlobus
We describe the automated data ingest scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform automated data ingest and present a faceted search interface that can be used by science gateways to simplify data discovery. We also walk through the application's GitHub repository and highlight relevant components.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Talk given at XSEDE 2012 conference in Chicago. The highlight were Dan Milroy and Brock Palen's presentations on experiences at Colorado and Michigan.
Paper is at https://www.globusonline.org/files/2012/07/XSEDE12-Globus-Campus-Bridging.pdf
As science becomes more computation and data intensive, computing needs often exceed campus capacity. Thus we see a desire to scale from the local environment to other campuses, to national cyberinfrastructure providers such as XSEDE, and/or to cloud providers—in other words, to “bridge” to the wider world. But given the realities of limited resources, time, and expertise, campus bridging methods must be exceedingly easy to use: as easy, for example, as are Netflix and Amazon movie streaming services. We report here on experiences with a service called Globus Online, which seeks to do for campus bridging what Netflix and Amazon do for movies: that is, use powerful cloud-hosted services and simple, intuitive web interfaces to make it “so easy that your grandparent can do it.” Specifically, we describe Globus Transfer, which addresses the important campus bridging use case of moving or synchronizing data across institutional boundaries. We describe how Globus Transfer achieves both ease of use for researchers and ease of administration for campus IT staff. We provide technical details on the Globus solution; quantitative data on usage by more than 25 early adopter campuses; and experience reports from two early adopters, the University of Michigan and the University of Colorado Boulder.
Simplified Research Data Management with the Globus PlatformGlobus
Overview of the Globus research data management platform, as presented at the Fall 2018 Membership Meeting of the Coalition for Networked Information (CNI), held in Washington, D.C., December 10-11, 2018
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
This talk was given at the IIPC General Assembly in Paris in May 2014. It introduces the distributed, parallel extraction framework provided by the Web Data Commons project. The framework is public accessible and tailored for the Amazon Web Service Stack. Besides the presentation includes an excerpt of datasets which were extracted from over 100 TB of crawling data and are as well available at http://webdatacommons.org.
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
Grid Computing is the emerging technology. you will learn all the stuff related to grid computing in this slides. this slide shows various architecture and its easy explanation.
20160922 Materials Data Facility TMS WebinarBen Blaiszik
Fall 2016 TMS Webinar on Data Curation Tools. Slides for the Materials Data Facility presentation on data services (publish and discover) as described by Ben Blaiszik. See http://www.materialsdatafacility.org for more information.
Screenshots prepared by Ben Blaiszik and Kyle Chard, used in our Globus publication demo at GlobusWorld 2014. See https://www.globus.org/data-publication for more information and the notes on the slides for details.
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
We describe the requirements for, and challenges of, distributing datasets at scale, e.g. from instruments such as CryoEM and advanced light sources. We demonstrate a web application that uses Globus to perform large-scale data distribution. We introduce and walk through a Jupyter notebook highlighting the relevant code to incorporate into a science gateway.
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGlobus
We describe the automated data ingest scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform automated data ingest and present a faceted search interface that can be used by science gateways to simplify data discovery. We also walk through the application's GitHub repository and highlight relevant components.
Automating Research Data Management at Scale with GlobusGlobus
Research computing facilities, such as the national supercomputing centers, and shared instruments, such as cryo electron microscopes and advanced light sources, are generating large volumes of data daily. These growing data volumes make it challenging for researchers to perform what should be mundane tasks: move data reliably, describe data for subsequent discovery, and make data accessible to geographically distributed collaborators. Most employ some set of ad hoc methods, which are not scalable, and it is clear that some level of automation is required for these tasks.
Globus is an established service from the University of Chicago that is widely used for managing research data in national laboratories, campus computing centers, and HPC facilities. While its intuitive web app addresses simple file transfer and sharing scenarios, automation at scale requires integrating Globus data management platform services into custom science gateways, data portals and other web applications in service of research. Such applications should enable automated ingest of data from diverse sources, launching of analysis runs on diverse computing resources, extraction and addition of metadata for creating search indexes, assignment of persistent identifiers faceted search for rapid data discovery, and point-and-click downloading of datasets by authorized users — all protected by an authentication and authorization substrate that allows the implementation of flexible data access policies for both metadata and data alike.
We describe current and emerging Globus services that facilitate these automated data flows while ensuring a streamlined user experience. We also demonstrate Petreldata.net, a data management portal and gateway to multiple computing resources, that supports large-scale research at the Advanced Photon Source.
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
We describe the large-scale data transfer scenario, referencing current and past research teams and their challenges. We demonstrate a web application that uses Globus to perform large-scale data transfers, and walk through a code repository with the web application’s code.
Talk given at XSEDE 2012 conference in Chicago. The highlight were Dan Milroy and Brock Palen's presentations on experiences at Colorado and Michigan.
Paper is at https://www.globusonline.org/files/2012/07/XSEDE12-Globus-Campus-Bridging.pdf
As science becomes more computation and data intensive, computing needs often exceed campus capacity. Thus we see a desire to scale from the local environment to other campuses, to national cyberinfrastructure providers such as XSEDE, and/or to cloud providers—in other words, to “bridge” to the wider world. But given the realities of limited resources, time, and expertise, campus bridging methods must be exceedingly easy to use: as easy, for example, as are Netflix and Amazon movie streaming services. We report here on experiences with a service called Globus Online, which seeks to do for campus bridging what Netflix and Amazon do for movies: that is, use powerful cloud-hosted services and simple, intuitive web interfaces to make it “so easy that your grandparent can do it.” Specifically, we describe Globus Transfer, which addresses the important campus bridging use case of moving or synchronizing data across institutional boundaries. We describe how Globus Transfer achieves both ease of use for researchers and ease of administration for campus IT staff. We provide technical details on the Globus solution; quantitative data on usage by more than 25 early adopter campuses; and experience reports from two early adopters, the University of Michigan and the University of Colorado Boulder.
Simplified Research Data Management with the Globus PlatformGlobus
Overview of the Globus research data management platform, as presented at the Fall 2018 Membership Meeting of the Coalition for Networked Information (CNI), held in Washington, D.C., December 10-11, 2018
An introduction deck for the Web of Data to my team, including basic semantic web, Linked Open Data, primer, and then DBpedia, Linked Data Integration Framework (LDIF), Common Crawl Database, Web Data Commons.
This talk was given at the IIPC General Assembly in Paris in May 2014. It introduces the distributed, parallel extraction framework provided by the Web Data Commons project. The framework is public accessible and tailored for the Amazon Web Service Stack. Besides the presentation includes an excerpt of datasets which were extracted from over 100 TB of crawling data and are as well available at http://webdatacommons.org.
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
Promoted by major search engines, schema.org has become a widely adopted standard for marking up structured data in HTML web pages. In this paper, we use a series of largescale Web crawls to analyze the evolution and adoption of schema.org over time. The availability of data from dierent points in time for both the schema and the websites deploying data allows for a new kind of empirical analysis of standards adoption, which has not been possible before. To conduct our analysis, we compare dierent versions of the schema.org vocabulary to the data that was deployed on hundreds of thousands of Web pages at dierent points in time. We measure both top-down adoption (i.e., the extent to which changes in the schema are adopted by data providers) as well as bottom-up evolution (i.e., the extent to which the actually deployed data drives changes in the schema). Our empirical analysis shows that both processes can be observed.
Grid Computing is the emerging technology. you will learn all the stuff related to grid computing in this slides. this slide shows various architecture and its easy explanation.
Grid Computing - Collection of computer resources from multiple locationsDibyadip Das
Grid computing is the collection of computer resources from multiple locations to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files.
Grid computing is the application of several computers to a single problem
at the same time.
This Presentation deals with the idea of Grid Computing, its Design
Considerations, How a Grid Works, and some of the existing Grids in the
World today.
68th ICREA Colloquium "The Worldwide LHC Computing Grid: Riding the computing...ICREA
The World Wide Web was invented at CERN in 1991. Construction of CERN's LHC was approved in 1994. Building the data processing system required by LHC's detectors in 1994 would have costed more than the accelerator itself. CERN and data centres from around the world started collaborating in 1999 to prototype and deploy the LHC Computing Grid, the first planetary scale high performance data processing system, which enabled the discovery of the Higgs boson in 2012. A review will be made of these developments, and their relationship to current areas of interest in data processing, such as "Big Data" and digitally supported collaborative science.
In computing, It is the description about Grid Computing.
It gives deep idea about grid, what is grid computing? , why we need it? , why it is so ? etc. History and Architecture of grid computing is also there. Advantages , disadvantages and conclusion is also included.
The Grid means the infrastructure for the Advanced Web, for computing, collaboration and communication.
The goal is to create the illusion of a simple yet large and powerful self managing virtual computer out of a large collection of connected heterogeneous systems sharing various combinations of resources.
“Grid” computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and ,in some cases, high-performance orientation .
We presented the Grid concept in analogy with that of an electrical power grid and Grid vision
This presentation contains basic introduction to cloud computing and Grid computing . Also mainly focusing on comparison in cloud and grid. This presentation taking some references on research papers.
Carl Kesselman and I (along with our colleagues Stephan Erberich, Jonathan Silverstein, and Steve Tuecke) participated in an interesting workshop at the Institute of Medicine on July 14, 2009. Along with Patrick Soon-Shiong, we presented our views on how grid technologies can help address the challenges inherent in healthcare data integration.
A Framework for Geospatial Web Services for Public Health by Dr. Leslie LenertWansoo Im
A Framework for Geospatial Web Services for Public Health
by Leslie Lenert, MD, MS, FACMI, Director
National Center for Public Health Informatics, CCHIS, CDC
June 8 2009 URISA Public Health Conference
uploaded by Wansoo Im, Ph.D.
URISA Membership Committee Chair
http://www.gisinpublichealth.org
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
These slides were presented in a session that we organized at the American Association for Advancement of Science (AAAS) meeting in Chicago, February 2009.
Abstract: New laboratory devices, sensor networks, high-throughput instruments, and numerical simulation systems are producing data at rates that are both without precedent and rapidly growing. The resulting increases in the size, number, and variety of data are revolutionizing scientific practice. These changes demand new computing infrastructures and tools. Until recently, most laboratories and collaborations managed their own data, operated their own computers, and used remote high-performance computers only when required. We are moving to a paradigm in which data will primarily be located and managed on remote clusters, grids, and data centers. In this symposium, we will examine the computing infrastructure designed to serve this emerging era of data-intensive computing from three perspectives: (1) that of grid computing, which enables the creation of virtual organizations that can share remote and distributed resources over the Internet; (2) that of data centers, which are transitioning to providers of integrated storage, data, compute, and collaboration services (the offering of one or more of these integrated services over the Internet is beginning to be called cloud computing); and (3) that of e-science, in which grids, Web 2.0 technologies, and new collaboration and analysis services are merging and changing the way science is conducted. Each speaker will focus on one perspective but also compare and contrast with the others.
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...Edward Curry
The Real-time Linked Dataspace (RLD) is an enabling platform for data management for intelligent systems within smart environments that combines the pay-as-you-go paradigm of dataspaces, linked data, and knowledge graphs with entity-centric real-time query capabilities.
The RLD contains all the relevant information within a data ecosystem including things, sensors, and data sources and has the responsibility for managing the relationships among these participants.
It manages sources without presuming a pre-existing semantic integration among them using specialised dataspace support services for loose administrative proximity and semantic integration for event and stream systems. Support services leverage approximate and best-effort techniques and operate under a 5 star model for “pay-as-you-go” incremental data management.
"Infrastructure, relationships, trust, and RDA" presentation given by Mark Parsons, RDA Secretary General at the eInfrastructures & RDA for Data Intensive Science Workshop - held prior to the RDA 6th Plenary, Paris, 22 September 2015.
Conceptual Architecture for USDA and NSF Terrestrial Observation Network Inte...Brian Wee
In light of the challenges facing agriculture over the next few decades, USDA and NEON leaders have been exchanging information on strategies for leveraging existing investments. In late 2012, the USDA launched its Long-Term Agro-Ecosystem Research (LTAR) network with an initial configuration of ten sites, three of which are co-located with NEON. Discussions have focused on the establishment of partnerships and the sharing of techniques, protocols, best practices, and physical infrastructure. This poster outlines some of those ideas.
Data Mesh is the decentralized architecture where your units of architecture is a domain driven data set that is treated as a product owned by domains or teams that most intimately know that data either creating it or they are consuming it and re-sharing it and allocated specific roles that have the accountability and the responsibility to provide that data as a product abstracting away complexity into infrastructure layer a self-serve infrastructure layer so that create these products more much more easily.
Accelerating Discovery via Science ServicesIan Foster
[A talk presented at Oak Ridge National Laboratory on October 15, 2015]
We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today’s researchers. I use examples from Globus and other projects to demonstrate what can be achieved.
Leveraging Open Source Technologies to Enable Scientific Archiving and Discovery; Steve Hughes, NASA; Data Publication Repositories
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Global Services for Global Science March 2023.pptxIan Foster
We are on the verge of a global communications revolution based on ubiquitous high-speed 5G, 6G, and free-space optics technologies. The resulting global communications fabric can enable new ultra-collaborative research modalities that pool sensors, data, and computation with unprecedented flexibility and focus. But realizing these modalities requires new services to overcome the tremendous friction currently associated with any actions that traverse institutional boundaries. The solution, I argue, is new global science services to mediate between user intent and infrastructure realities. I describe our experiences building and operating such services and the principles that we have identified as needed for successful deployment and operations.
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
I describe the origins, current state and potential future directions for the Earth System Grid Federation, an international consortium that develops infrastructure for sharing of climate simulation and related datasets.
Keynote talk at 2022-10-11 ESnet6 launch. A lovely event by a great team. It was a pleasure to talk about how ESnet6 will enable new "smart instruments"--and some of the work that we are doing to that end.
Linking Scientific Instruments and ComputationIan Foster
[Talk presented at Monterey Data Conference, August 31, 2022]
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines—what we call flows—that link instruments, computers (e.g., for analysis, simulation, AI model training), edge computing (e.g., for analysis), data stores, metadata catalogs, and high-speed networks. We review common patterns associated with such flows and describe methods for instantiating these patterns. We present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages powerful computers for data inversion, machine learning model training, or other purposes. We also discuss implications of such methods for operators and users of scientific facilities.
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
Talk in the National Science Data Fabric (NSDF) Distinguished Speaker Series
The Globus team has spent more than a decade developing software-as-a-service methods for research data management, available at globus.org. Globus transfer, sharing, search, publication, identity and access management (IAM), automation, and other services enable reliable, secure, and efficient managed access to exabytes of scientific data on tens of thousands of storage systems. For developers, flexible and open platform APIs reduce greatly the cost of developing and operating customized data distribution, sharing, and analysis applications. With 200,000 registered users at more than 2,000 institutions, more than 1.5 exabytes and 100 billion files handled, and 100s of registered applications and services, the services that comprise the Globus platform have become essential infrastructure for many researchers, projects, and institutions. I describe the design of the Globus platform, present illustrative applications, and discuss lessons learned for cyberinfrastructure software architecture, dissemination, and sustainability.
Video is at https://www.youtube.com/watch?v=p8pCHkFFq1E
Daniel Lopresti, Bill Gropp, Mark D. Hill, Katie Schuman, and I put together a white paper on "Building a National Discovery Cloud" for the Computing Community Consortium (http://cra.org/ccc). I presented these slides at a Computing Research Association "Best Practices on using the Cloud for Computing Research Workshop" (https://cra.org/industry/events/cloudworkshop/).
Abstract from White Paper:
The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each development has major implications for how societies function and compete; together, they represent a change in technological foundations of society as profound as the telegraph or electrification. Societies that embrace these changes will lead in the 21st Century; those that do not, will decline in prosperity and influence. Nowhere is this stark choice more evident than in research and education, the two sectors that produce the innovations that power the future and prepare a workforce able to exploit those innovations, respectively. In this article, we introduce these developments and suggest steps that the US government might take to prepare the research and education system for its implications.
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
I presented to the Environmental Data Science group at UChicago, with the goal of getting them excited about the opportunities inherent in big data, big computing, and AI--and to think about how to collaborate with Argonne in those areas. We had a great and long conversation about Takuya Kurihana's work on unsupervised learning for cloud classification. I also mentioned our work making NASA and CMIP data accessible on AI supercomputers.
In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
A talk at the CODATA/RDA meeting in Gaborone, Botswana. I made the case that the biggest barriers to effective data sharing and reuse are often those associated with "data friction" and that cloud automation can be used to overcome those barriers.
The image on the first slide shows a few of the more than 20,000 active Globus endpoints.
Research Automation for Data-Driven DiscoveryIan Foster
Talk presented at Workshop on Maximizing the Scientific Return of NASA Data. Makes the case that automation and outsourcing of data management tasks to cloud services is essential for effective data-driven discovery. Describes how the Globus research data management platform addresses this need.
Scaling collaborative data science with Globus and JupyterIan Foster
The Globus service simplifies the utilization of large and distributed data on the Jupyter platform. Ian Foster explains how to use Globus and Jupyter to seamlessly access notebooks using existing institutional credentials, connect notebooks with data residing on disparate storage systems, and make data securely available to business partners and research collaborators.
New learning technologies seem likely to transform much of science, as they are already doing for many areas of industry and society. We can expect these technologies to be used, for example, to obtain new insights from massive scientific data and to automate research processes. However, success in such endeavors will require new learning systems: scientific computing platforms, methods, and software that enable the large-scale application of learning technologies. These systems will need to enable learning from extremely large quantities of data; the management of large and complex data, models, and workflows; and the delivery of learning capabilities to many thousands of scientists. In this talk, I review these challenges and opportunities and describe systems that my colleagues and I are developing to enable the application of learning throughout the research process, from data acquisition to analysis.
Plenary talk at the international Synchrotron Radiation Instrumentation conference in Taiwan, on work with great colleagues Ben Blaiszik, Ryan Chard, Logan Ward, and others.
Rapidly growing data volumes at light sources demand increasingly automated data collection, distribution, and analysis processes, in order to enable new scientific discoveries while not overwhelming finite human capabilities. I present here three projects that use cloud-hosted data automation and enrichment services, institutional computing resources, and high- performance computing facilities to provide cost-effective, scalable, and reliable implementations of such processes. In the first, Globus cloud-hosted data automation services are used to implement data capture, distribution, and analysis workflows for Advanced Photon Source and Advanced Light Source beamlines, leveraging institutional storage and computing. In the second, such services are combined with cloud-hosted data indexing and institutional storage to create a collaborative data publication, indexing, and discovery service, the Materials Data Facility (MDF), built to support a host of informatics applications in materials science. The third integrates components of the previous two projects with machine learning capabilities provided by the Data and Learning Hub for science (DLHub) to enable on-demand access to machine learning models from light source data capture and analysis workflows, and provides simplified interfaces to train new models on data from sources such as MDF on leadership scale computing resources. I draw conclusions about best practices for building next-generation data automation systems for future light sources.
Going Smart and Deep on Materials at ALCFIan Foster
As we acquire large quantities of science data from experiment and simulation, it becomes possible to apply machine learning (ML) to those data to build predictive models and to guide future simulations and experiments. Leadership Computing Facilities need to make it easy to assemble such data collections and to develop, deploy, and run associated ML models.
We describe and demonstrate here how we are realizing such capabilities at the Argonne Leadership Computing Facility. In our demonstration, we use large quantities of time-dependent density functional theory (TDDFT) data on proton stopping power in various materials maintained in the Materials Data Facility (MDF) to build machine learning models, ranging from simple linear models to complex artificial neural networks, that are then employed to manage computations, improving their accuracy and reducing their cost. We highlight the use of new services being prototyped at Argonne to organize and assemble large data collections (MDF in this case), associate ML models with data collections, discover available data and models, work with these data and models in an interactive Jupyter environment, and launch new computations on ALCF resources.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Grid Computing July 2009
1. Grid computing Ian Foster Computation Institute Argonne National Lab & University of Chicago
2. “ When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001)
3.
4. “ Computation may someday be organized as a public utility … The computing utility could become the basis for a new and important industry.” John McCarthy (1961)
9. We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos Zone of complexity
10. We need to function in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
11.
12.
13.
14.
15.
16.
17. The Grid paradigm and information integration Data sources Platform services Radiology Medical records Name resources; move data around Make resources usable and useful Make resources accessible over the network Pathology Genomics Labs Manage who can do what RHIO
18. The Grid paradigm and information integration Data sources Platform services Transform data into knowledge Radiology Medical records Management Integration Publication Enhance user cognitive processes Incorporate into business processes Pathology Genomics Labs Security and policy RHIO
19. The Grid paradigm and information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
20.
21.
22. Identity-based authZ Most simple - not scalable Unix Access Control Lists (Discretionary Access Control: DAC) Groups, directories, simple admin POSIX ACLs/MS-ACLs Finer-grained admin policy Role-based Access Control (RBAC) Separation of role/group from rule admin Mandatory Access Control (MAC) Clearance, classification, compartmentalization Attribute-based Access Control (ABAC) Generalization of attributes >>> Policy language abstraction level and expressiveness >>>
29. Children’s Oncology Group Enterprise/Grid Interface service DICOM protocols Grid protocols (Web services) DICOM XDS HL7 Vendor-specific Wide area service actor Plug-in adapters
30.
31. As of Oct 19, 2008: 122 participants 105 services 70 data 35 analytical
32.
33.
34.
35.
36. Health Object Identifier (HOI) naming system uri:hdl :// 888 .us.npi. 1234567890 .dicom/ 8A648C33 -A5…4939EBE Random String for Identifier-Body PHI-free and guaranteed unique 888: CHI’s top-level naming authority National Provider Id used in hierarchical Identifier Namespace Application Context’s Namespace governed by provider Naming Authority HOI’s URI schema identifier—based on Handle
39. Integration : Making information useful ? 0% 100% Degree of prior syntactic and semantic agreement Degree of communication 0% 100% Rigid standards-based approach Loosely coupled approach Adaptive approach
40.
41. ECOG 5202 integrated sample management ECOG CC ECOG PCO MD Anderson Web portal OGSA-DQP OGSA-DAI OGSA-DAI OGSA-DAI Mediator
42.
43.
44. Many many tasks: Identifying potential drug targets 2M+ ligands Protein x target(s) (Mike Kubal, Benoit Roux, and others)
45. start report DOCK6 Receptor (1 per protein: defines pocket to bind to) ZINC 3-D structures ligands complexes NAB script parameters (defines flexible residues, #MDsteps) Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript end BuildNABScript NAB Script NAB Script Template Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB) PDB protein descriptions For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years) 6 GB 2M structures (6 GB) DOCK6 FRED ~4M x 60s x 1 cpu ~60K cpu-hrs Amber ~10K x 20m x 1 cpu ~3K cpu-hrs Select best ~500 ~500 x 10hr x 100 cpu ~500K cpu-hrs GCMC Select best ~5K Select best ~5K
46.
47. Scaling Posix to petascale … . . . Large dataset CN-striped intermediate file system Torus and tree interconnects Global file system Chirp (multicast) MosaStore (striping) Staging Inter- mediate Local LFS Compute node (local datasets) LFS Compute node (local datasets)
48. Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors
53. Functioning in the zone of complexity Ralph Stacey, Complexity and Creativity in Organizations , 1996 Low Low High High Agreement about outcomes Certainty about outcomes Plan and control Chaos
54. The Grid paradigm and information integration Data sources Platform services Value services Analysis Radiology Medical records Management Integration Publication Cognitive support Applications Pathology Genomics Labs Security and policy RHIO
55. “ The computer revolution hasn’t happened yet.” Alan Kay, 1997
56. Time Connectivity (on log scale) Science Enterprise Consumer “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid Cloud ????
With high-speed networks, the Internet becomes more than a communications device—it becomes a computing device. We can disintegrate the computer – outsourcing computing and storage, for example. And we can aggregate capabilities (data and software; computing and storage) from many places The outsourcing/on-demand part is what people have called grid, utility computing, and more recently infrastructure as a service or cloud. It seems to be going mainstream, which is very exciting (and about time!) It’s worth remembering that these ideas are old
What I want to focus on today is the aggregation part, and in particular on the “virtual organization” concept. Let me remind us of another comment made back in 2001.
Early on, people realized that it didn’t make sense for people to travel to computers—that we should be able to compute outside the box. For example, AI pioneer John McCarthy spoke in these terms in 1961, at the launch of Project MAC (?) Here he is a couple of years ago, as such an industry is just emerging. It takes a while.
We cite [Rouse, Health Care as a CAS: Implications for Design… , NAE 2008] for the righthand side aprt. Must support Dynamic composition for a specific purpose Evolving community, function, environment Messy data, failure, incomplete knowledge Nice, but insufficient Data standards Platform standards Federal policies
Another perspective on the problem. A few words of explanation. If we are deploying a hospital IT system, we have Add other regions of agreement. You can’t achieve success via central planning. Quoted in Crossing the Quality Chasm, p. 312
We could show these things as moving if we wanted to be really clever Over time, things change, these groups evolve. If we are successful, they merge
Foster, Kesselman, and Tuecke claimed that grids were all about “virtual organizations.” The way one should interpret that claim, I would assert, is in the context of Gilder’s comments. Things are distributed, for one reason or another—either via deliberate disintegration process, via outsourcing, or because they just started out distributed. Now we need to reassemble them, in a controlled manner. We gave some examples
The first encompasses what people are tending to call “cloud” today. The fourth of course we are quite familiar with! Today, I would use some additional examples, taken from healthcare—a field that I believe will be the “killer app” for VO technologies
I particular, the organizational behavior and management community, who have studied virtual organizations for many years. Our VOs have a lot in common with their’s, but also differences—we’re not just about people, and maybe not even particularly about people. Fortunately we were able to speak to a lot of these people a couple of years ago, via some NSF workshops we organized.
The results are online – “a blueprint for advancing the design, development, and evaluation of virtual organizations.” One interesting anecdote: I found that just as CS can resent being brought into collaborative projects to “write code,” so organizational people can resent being brought in to “fix organizations” One thing I learned was that …
Technology that has been under development for some years Include Globus logo. caGrid, BIRN LHC
Sharing relationships form and devolve dynamically—e.g., temporally Picture on left?
“ Make data usable and useful” initially, I had “Address syntactic, semantic differences”
Talk about API vs Protocol Add “ilities,” function benefits to stack.
Talk about API vs Protocol Add “ilities,” function benefits to stack.
[Create an image here.] For example DICOM and HL7 combine messaging and data model in the same interoperability standard. People are contextualizing this problem at the data interoperability level. Systems interoperability often neglected. An area of differentiation, bringing in best practice in industry and science into health care space. Open source platform. Experience with systems interoperability standards: IETF, OASIS, W3C,
Attribute authorities emerge as an important system component Bridge between local and global: honest broker is an example Note sure what “policy in the network” means.
List services from
DO SOMETHING INTERESTING ON THE RIGHT Scaling via automating data adapters Representations of those things and semantics of those representations. Talk about how services are published, data modeling, etc. Publish data bases Publish services Name published objects
Why childhood cancer? Rare. 5-year survival rates for all childhood cancers combined increase dfrom 58.1 percent in 1975-77 to 79.6 percent in 1996-2003
07/25/09 Test Built using the same mechanisms used to build SOI. -- PKI, delegation, attribute-based authorization -- Registries, monitoring Operating a service is a pain! Would be nice to outsource. But they need to be near the data, which also has privacy concerns. So things become complicated.
Objects are published, they need to be named, then they can be moved around without losing track of them Bulk data movement Fine grain access for data integration
GridFTP = high-perf data movement, multiple protocols, credential delegation, restart RLS = P2P system, soft state, Bloom filters, BUT: the services themselves are operated by the LIGO community. Running persistent, reliable, scalable services is expensive and difficult
Clinical, administrative, research. Issues often hidden and escalate Uniqueness No guaranteed global uniqueness Name ownership No ability to prove that a certain entity issued that name PHI-tainted names Filenames for some images have patientID embedded – sharing of name only may constitute HIPPA violation
Talk about handle….
TO PUT IN A SLIDE? Loose coupling and encapsulation Interoperability through integration based on data mediation Evolutionary in nature Set of scalable systems and methods Explicit in architecture – data integration layer Demonstrated in GSI, GridFTP, MDS, ECOG
This would be a good place for a graphic, perhaps showing top down vs. bottom up.
No coordinated data systems Excel spreadsheet Web service to application Oracle data base
DO SOMETHING INTERESTING ON THE RIGHT Scaling via automating data adapters Representations of those things and semantics of those representations. Talk about how services are published, data modeling, etc. Publish data bases Publish services Name published objects
07/25/09 Test Workflows are becoming a widespread mechanism for coordinating the execution of scientific services and linking scientific resources. Analytical and data processing pipelines. Is this stuff real? EBI 3 million+ web service API submissions in 2007 A lot? We want to publish workflows as services. Think of caBIG services as service providers that then invoke grid services to execute services. (E.g., via TeraGrid gateways.)
"docking" is the identification of the low-energy binding modes of a small molecule (ligands) within the active site of a macromolecule (receptor) whose structure is known A compound that interacts strongly with (i.e. binds) a receptor associated with a disease may inhibit its function and thus act as a drug Typical Workload: Application Size: 7MB (static binary) Static input data: 35MB (binary and ASCII text) Dynamic input data:10KB (ASCII text) Output data: 10KB (ASCII text) Expected execution time: 5~5000 seconds Parameter space: 1 billion tasks
More precisely, step 3 is “GCMC + hydration.” Mike Kubal say: “This task is a Free Energy Perturbation computation using the Grand Canonical Monte Carlo algorithm for modeling the transition of the ligand (compound) between different potential states and the General Solvent Boundary Partition to explicitly model the water molecules in the volume around the ligand and pocket of the protein. The result is a binding energy just like the task at the top of the funnel; it is just a more rigorous attempt to model the actual interaction of protein and compound. To refer to the task in short hand, you can use "GCMC + hydration". This is a method that Benoit has pioneered.”
Application Efficiency was computed between the 16 rack and 32 rack runs. Sustained Utilization is the utilization achieved during the part of the experiment while there was enough work to do, 0 to 5300 sec. Overall utilization is the number of CPU hours used divided by total number of CPU hours allocated. The experiment included the caching of the 36 MB (52MB uncompressed) archive on each of the 1 st access per node We use “dd” to move data to and from GPFS…. The application itself had some bad I/O patterns in the write, which prevented it from scaling well, so we decided to write to RAM, and then dd back to GPFS. For this particular run, we had 464 Falkon services running on 464 I/O nodes, 118K workers (256 per Falkon service), and 1 client on a login node. The 32 rack job took 15 minutes to start. It took the client 6 minutes to establish a connection and setup the corresponding state with all 464 Falkon services. It took the client 40 seconds to dispatch 118K tasks to 118K CPUs. The rest can be seen from the graph and slide text…
We could show these things as moving if we wanted to be really clever Over time, things change, these groups evolve. If we are successful, they merge
Talk about API vs Protocol Add “ilities,” function benefits to stack.
Because we are still mostly computing inside the box
Why now? Law of unexpected consequences—like Web: not just Tim Berners-Lee’s genius, but also disk drive capacity What will happen when ubiquitous high-speed wireless means we can all reach any service anytime—and powerful tools mean we can author our own services? Fascinating set of challenges -- What sort of services? Applications? -- What does openness mean in this context? -- How do we address interoperability, portability, composition? -- Accounting, security, audit?