SlideShare a Scribd company logo
1 of 33
Download to read offline
Gladier: The Globus Architecture for
Data Intensive Experimental Research
October 15, 2021
Agenda
• Large scale experiments
• Gladier
• funcX
• Globus Flows
• Gladier toolkit
• Building your own Gladier pipeline
• Example deployments
• Demo a real client
The brilliance arms race...
K. Wille, The Physics of Particle Accelerators: An Introduction, Oxford University Press, Oxford, UK (2000). J. B. Parise and G. E. Brown, Jr., Elements, 2, 37-42 (2006).
Argonne Leadership Computing Facility
Advanced Photon Source
Different facilities, different people..
• ALCF and APS have very distinct…
– Research Statements
– Entry Curve
– Skill Requirements
– Allocation System
– Support Staff
– Time Scales
– Etc.. Yada.. Yada.. Yada..
Canonical research automation flow for instruments
6
Data Capture Data Analysis /
Model in the Loop
Publication
Data Staging
Metadata Extraction
And Data Cataloging
Data Staging
Catalog
Feedback
Data
Generation
Examples
• Serial X-Ray
Crystallography
• X-Ray Photon
Correlated
Spectroscopy
• High energy
diffraction
microscopy
• High throughput
ptychography
• High energy x-
ray diffractions
Which is only as simple as the amount of data you acquire
Local vs Distributed
Acquire
Process
Visualize
Experiment Location
Normal experiments rely on having storage
and processing units “close” to the
acquisition machines.
Distributed system allow the beamline to
focus only on the experimental apparatus.
Local vs Distributed
Acquire
Process
Visualize
Transfer Raw Data
Transfer Instructions
Remote Location
Transfer Results
Process
Experiment Location
Gladier: The Globus Architecture for Data-Intensive
Experimental Research
• Accelerate and simplify
flow development and
deployment
• Combine tools into
reliable, flexible, secure,
distributed flows
• Bridge instruments and
computing facilities
• Automate data collection
and publication to create
FAIR data
Gladier:Globus+ALCF framework
for online, data-intensive, large-
scale experiment science
Gladier is a framework for combining
instruments, storage, and compute using
loosely coupled services
Reference implementation to gather
experiences
Globus: Remote data management
Flows: Workflows that span time and space
funcX: Remote (scalable) execution on
diverse HPC-edge systems
ALCF Community Data Co-Op portals:
Indexing and visualizing scientific data
ALCF Eagle: User-managed storage
Globus Services for Research Data Management
Unified Data Access Data Transfer Platform as a Service
Auth
Transfer
Share
Search
…
Distributed Automation
Remote Execution
Data Publication
Globus Services for Research Data Management
funcX: managed and federated FaaS
• Cloud-hosted service for managing compute
• Register and share compute endpoints
• Register and share Python functions
• Reliably, scalable, securely execute functions on
remote endpoints
• Integrated with Globus Auth and data ecosystem
14
Transform laptops, clusters, clouds into function
serving endpoints
• Python-based agent and pip
installable locally or in Conda
• Elastically provisions resources
from local, cluster, or cloud system
• Manages concurrent execution on
provisioned resources
• Optionally manages execution in
Docker, Singularity, Shifter
containers
• Share endpoints with collaborators
15
$ pip install funcx-endpoint
$ funcx-endpoint configure myep
$ funcx-endpoint start myep
Register and share functions
Create funcX client (and authn)
16
def compute(input_args):
# do something
return results
def compute(input_args):
# do something
return results
def compute(input_args):
# do something
return results
Define and register Python function
funcX Demo
Try funcx on Binder
https://funcx.org/binder
Data (and compute) automation
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
Extending the ecosystem: Action providers
19
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notificatio
n
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
Applying the Globus
platform to science at
the APS
20
Advanced
Photon
Source
Key: funcX agent
Globus Connect
Theta
Bebop
Cluster
Argonne
Leadership
Computing
Facility
Laboratory
Computing
Research
Center
Petrel store
APS
Computing
Orthros Cluster
APS DM
system
Porta
l
serve
r
Porta
l
serve
r
Cooley
Action 1 Action 2 Action 3 Action 4
Gladier Toolkit ● Function registration
● Flow registration
● Re-registration on file change
● Automate auth
● Input Validation
● Metadata Injection
● Interactive Progress Reporting
● Error handling
Gladier provides structure for running
Actions in Globus Automate flows by
wrapping them as a reusable Tool
Actions can be Funcx functions,
Transfers, triggers or any HTTP action
provider.
Our toolbox provides two things, a set of common used experimental tools and a
Client to orchestrate how they will run and interact with the experiments.
- The Gladier Tools define the work to be done
- The Gladier Base defines a collection of Gladier Tools, and ensures all of
the requirements for using them have been met.
https://github/globus-gladier/gladier
Pip install gladier
Gladier Client
Provides a concise
configuration of Gladier
tools to be used in a flow
Tools
• Automatically registers
funcX functions
• Automatically registers
Automate flows
• Watches for changes,
and re-registers
anything as needed
Lets try it!
https://jupyter.demo.globus.org/
XPCS
ALCF Data
Portal
Argonne
JLSE
Argonne
Leadership
Computing Facility
APS
Publication
5
Imaging
1
Lab Server
1
Acquisition
2
Plot results
4
XPCS-Eigen
3
Science!
6
With Suresh Narayanan et al. APS Sector 8-ID
XPCS
Globus
Search
Globus
Transfer
Return
results
Globus
Transfer
Transfer
input data
Catalog
Results
Data Acquisition
High-quality FAIR data
funcX
Analyze
images
funcX
Visualize
results
Searchable
Portal
Online XPCS
• Integration with the APS DM system to trigger
Globus Automate flow.
• Flow moves data to ALCF, perform analysis, publish
results
• Metadata and plots are dynamicallys extracted and
integrated into ALCF portal allowing users to
monitor experiments and reprocess data
Serial Crystallography Automation
With Andrzej Joachimiak, Darren Sherrell et al. APS Sector 19
Closing the loop
“These data services have taken the time to solve a structure
from weeks to days and now to hours”
Darren Sherrell, SBC beamline scientist APS Sector 19
4 structures available in PDB – Scientific paper
forthcoming
ALCF + APS
capabilities were
used to determine
the room
temperature
structure of >4 viral
surface proteins
Next steps: Develop
Nature Methods paper,
continue running flow,
provide DOE
highlights
ALCF Data Services in the DOE COVID19 Fight
Example: Rapid Training of Deep Neural Networks
using Remote Resources
• DNN at the edge for fast
processing, filtering, QC
• Requires tight coupling
with simulation and
training with real-time data
• Globus Flow:
31
Zhengchun Liu, Jana Thayar, et al.
– Globus to rapidly move data for training
– funcX for simulation and model training
– Globus to move models to the edge
– (Future) funcX for inference at the edge
Ptychography
Automated flows leveraging ThetaGPU for 2D
and 3D reconstructions
- Total size: 1.32 TB, 3082 scans
- 100 iterations: 199GB,1602 scans
- 500 iterations: 502GB, 383 scans
- 1000 iterations: 616GB, 1097 scans
- Inverse problem, ML iterations for
reconstruction to converge run
concurrently and faster processing sent to
scientist immediately
- Size depends on output frequency
- Scans are reconstructed during ALCF
reservation, however, additional data are
acquired after reservation expires
- Opportunistic reconstruction via backfill and
standard queues
- 3082 workflows executed
- Single workflow: 3 transfers + 1 funcX call
to run reconstruction on ThetaGPU
ALCF
High-Energy X-ray Diffraction Microscopy
Requirement to select where MIDAS
analyses are executed: APS Orthros,
ALCF ThetaGPU, or ALCF Cooley
● MIDAS: tomography reconstruction, near-
field and far-field diffraction analysis
Flow:
● Globus to transfer input data to
destination
● funcX dynamically provisions resources
and runs analysis at scale
● Deploy containers with MIDAS software
to perform tasks
● Results assembled and returned to APS
Extending the system to allow users to
run analysis at home institute
Hemant Sharma, et al.
Experiment integration with pyEPICS
Data
Acquisition
Gladier resources can in the future influence
experiments by directly controlling local experiments
Return
results
Transfer
input
data
Analyze
images
Experiment
control
Return
results
Search Viz
Decision
Searchable Portal
A framework for multiplication
• Each new gladier repository lowers the entry barrier of the next one.
• New experiments leveraging this work will allow us to scale the
capabilities across the APS and other facilities.
• Expertise and capabilities from this project allowed the team to play key
roles in DOE’s COVID-19 response, and the new National Virtual
Biotechnology Lab (NVBL)
• Newly-funded DOE ASCR project (Braid) will allow new modular
capabilities to be developed (e.g., Rule-based engine to support
continuum computing concepts ) and added to Gladier that allow
scientists to more easily access ALCF resources
• Gladier permits to integrate with any experimental capability that is
python based, i.e. pyEPICS, bluesky, automation.
Real life Gladier
XPCS in real-time

More Related Content

What's hot

Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Guido Schmutz
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Flink Forward
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
confluent
 

What's hot (20)

Presto
PrestoPresto
Presto
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
 
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQLIngesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Change Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVHChange Data Capture with Data Collector @OVH
Change Data Capture with Data Collector @OVH
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Streaming ETL for All
Streaming ETL for AllStreaming ETL for All
Streaming ETL for All
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
Lessons Learned from Managing Thousands of Production Apache Spark Clusters w...
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Building Data Pipelines in Python
Building Data Pipelines in PythonBuilding Data Pipelines in Python
Building Data Pipelines in Python
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Embeddable data transformation for real time streams
Embeddable data transformation for real time streamsEmbeddable data transformation for real time streams
Embeddable data transformation for real time streams
 

Similar to Gladier: The Globus Architecture for Data Intensive Experimental Research (APS Workshop)

Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Globus
 

Similar to Gladier: The Globus Architecture for Data Intensive Experimental Research (APS Workshop) (20)

Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!Shaping the Future: To Globus Compute and Beyond!
Shaping the Future: To Globus Compute and Beyond!
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Scientific
Scientific Scientific
Scientific
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 

More from Globus

How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
Globus
 

More from Globus (20)

The Department of Energy's Integrated Research Infrastructure (IRI).pdf
The Department of Energy's Integrated Research Infrastructure (IRI).pdfThe Department of Energy's Integrated Research Infrastructure (IRI).pdf
The Department of Energy's Integrated Research Infrastructure (IRI).pdf
 
Research Automation with Globus Flows.pptx
Research Automation with Globus Flows.pptxResearch Automation with Globus Flows.pptx
Research Automation with Globus Flows.pptx
 
Reactive Documents and Computational Pipelines
Reactive Documents and Computational PipelinesReactive Documents and Computational Pipelines
Reactive Documents and Computational Pipelines
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
GlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote AddressGlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote Address
 
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use CasesGlobus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
 
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) WorkflowsGlobus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
 
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
 
Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)
 
Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
 
Enhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptxEnhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptx
 
Enhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdfEnhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
 
Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 

Gladier: The Globus Architecture for Data Intensive Experimental Research (APS Workshop)

  • 1. Gladier: The Globus Architecture for Data Intensive Experimental Research October 15, 2021
  • 2. Agenda • Large scale experiments • Gladier • funcX • Globus Flows • Gladier toolkit • Building your own Gladier pipeline • Example deployments • Demo a real client
  • 3. The brilliance arms race... K. Wille, The Physics of Particle Accelerators: An Introduction, Oxford University Press, Oxford, UK (2000). J. B. Parise and G. E. Brown, Jr., Elements, 2, 37-42 (2006).
  • 4. Argonne Leadership Computing Facility Advanced Photon Source
  • 5. Different facilities, different people.. • ALCF and APS have very distinct… – Research Statements – Entry Curve – Skill Requirements – Allocation System – Support Staff – Time Scales – Etc.. Yada.. Yada.. Yada..
  • 6. Canonical research automation flow for instruments 6 Data Capture Data Analysis / Model in the Loop Publication Data Staging Metadata Extraction And Data Cataloging Data Staging Catalog Feedback Data Generation Examples • Serial X-Ray Crystallography • X-Ray Photon Correlated Spectroscopy • High energy diffraction microscopy • High throughput ptychography • High energy x- ray diffractions Which is only as simple as the amount of data you acquire
  • 7. Local vs Distributed Acquire Process Visualize Experiment Location Normal experiments rely on having storage and processing units “close” to the acquisition machines. Distributed system allow the beamline to focus only on the experimental apparatus.
  • 8. Local vs Distributed Acquire Process Visualize Transfer Raw Data Transfer Instructions Remote Location Transfer Results Process Experiment Location
  • 9. Gladier: The Globus Architecture for Data-Intensive Experimental Research • Accelerate and simplify flow development and deployment • Combine tools into reliable, flexible, secure, distributed flows • Bridge instruments and computing facilities • Automate data collection and publication to create FAIR data
  • 10. Gladier:Globus+ALCF framework for online, data-intensive, large- scale experiment science Gladier is a framework for combining instruments, storage, and compute using loosely coupled services Reference implementation to gather experiences Globus: Remote data management Flows: Workflows that span time and space funcX: Remote (scalable) execution on diverse HPC-edge systems ALCF Community Data Co-Op portals: Indexing and visualizing scientific data ALCF Eagle: User-managed storage
  • 11. Globus Services for Research Data Management Unified Data Access Data Transfer Platform as a Service Auth Transfer Share Search … Distributed Automation Remote Execution Data Publication Globus Services for Research Data Management
  • 12. funcX: managed and federated FaaS • Cloud-hosted service for managing compute • Register and share compute endpoints • Register and share Python functions • Reliably, scalable, securely execute functions on remote endpoints • Integrated with Globus Auth and data ecosystem 14
  • 13. Transform laptops, clusters, clouds into function serving endpoints • Python-based agent and pip installable locally or in Conda • Elastically provisions resources from local, cluster, or cloud system • Manages concurrent execution on provisioned resources • Optionally manages execution in Docker, Singularity, Shifter containers • Share endpoints with collaborators 15 $ pip install funcx-endpoint $ funcx-endpoint configure myep $ funcx-endpoint start myep
  • 14. Register and share functions Create funcX client (and authn) 16 def compute(input_args): # do something return results def compute(input_args): # do something return results def compute(input_args): # do something return results Define and register Python function
  • 15. funcX Demo Try funcx on Binder https://funcx.org/binder
  • 16. Data (and compute) automation • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  • 17. Extending the ecosystem: Action providers 19 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notificatio n ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  • 18. Applying the Globus platform to science at the APS 20 Advanced Photon Source Key: funcX agent Globus Connect Theta Bebop Cluster Argonne Leadership Computing Facility Laboratory Computing Research Center Petrel store APS Computing Orthros Cluster APS DM system Porta l serve r Porta l serve r Cooley Action 1 Action 2 Action 3 Action 4
  • 19. Gladier Toolkit ● Function registration ● Flow registration ● Re-registration on file change ● Automate auth ● Input Validation ● Metadata Injection ● Interactive Progress Reporting ● Error handling Gladier provides structure for running Actions in Globus Automate flows by wrapping them as a reusable Tool Actions can be Funcx functions, Transfers, triggers or any HTTP action provider. Our toolbox provides two things, a set of common used experimental tools and a Client to orchestrate how they will run and interact with the experiments. - The Gladier Tools define the work to be done - The Gladier Base defines a collection of Gladier Tools, and ensures all of the requirements for using them have been met. https://github/globus-gladier/gladier Pip install gladier
  • 20. Gladier Client Provides a concise configuration of Gladier tools to be used in a flow Tools • Automatically registers funcX functions • Automatically registers Automate flows • Watches for changes, and re-registers anything as needed
  • 22. XPCS ALCF Data Portal Argonne JLSE Argonne Leadership Computing Facility APS Publication 5 Imaging 1 Lab Server 1 Acquisition 2 Plot results 4 XPCS-Eigen 3 Science! 6 With Suresh Narayanan et al. APS Sector 8-ID
  • 24. Online XPCS • Integration with the APS DM system to trigger Globus Automate flow. • Flow moves data to ALCF, perform analysis, publish results • Metadata and plots are dynamicallys extracted and integrated into ALCF portal allowing users to monitor experiments and reprocess data
  • 25. Serial Crystallography Automation With Andrzej Joachimiak, Darren Sherrell et al. APS Sector 19
  • 27. “These data services have taken the time to solve a structure from weeks to days and now to hours” Darren Sherrell, SBC beamline scientist APS Sector 19 4 structures available in PDB – Scientific paper forthcoming ALCF + APS capabilities were used to determine the room temperature structure of >4 viral surface proteins Next steps: Develop Nature Methods paper, continue running flow, provide DOE highlights ALCF Data Services in the DOE COVID19 Fight
  • 28. Example: Rapid Training of Deep Neural Networks using Remote Resources • DNN at the edge for fast processing, filtering, QC • Requires tight coupling with simulation and training with real-time data • Globus Flow: 31 Zhengchun Liu, Jana Thayar, et al. – Globus to rapidly move data for training – funcX for simulation and model training – Globus to move models to the edge – (Future) funcX for inference at the edge
  • 29. Ptychography Automated flows leveraging ThetaGPU for 2D and 3D reconstructions - Total size: 1.32 TB, 3082 scans - 100 iterations: 199GB,1602 scans - 500 iterations: 502GB, 383 scans - 1000 iterations: 616GB, 1097 scans - Inverse problem, ML iterations for reconstruction to converge run concurrently and faster processing sent to scientist immediately - Size depends on output frequency - Scans are reconstructed during ALCF reservation, however, additional data are acquired after reservation expires - Opportunistic reconstruction via backfill and standard queues - 3082 workflows executed - Single workflow: 3 transfers + 1 funcX call to run reconstruction on ThetaGPU
  • 30. ALCF High-Energy X-ray Diffraction Microscopy Requirement to select where MIDAS analyses are executed: APS Orthros, ALCF ThetaGPU, or ALCF Cooley ● MIDAS: tomography reconstruction, near- field and far-field diffraction analysis Flow: ● Globus to transfer input data to destination ● funcX dynamically provisions resources and runs analysis at scale ● Deploy containers with MIDAS software to perform tasks ● Results assembled and returned to APS Extending the system to allow users to run analysis at home institute Hemant Sharma, et al.
  • 31. Experiment integration with pyEPICS Data Acquisition Gladier resources can in the future influence experiments by directly controlling local experiments Return results Transfer input data Analyze images Experiment control Return results Search Viz Decision Searchable Portal
  • 32. A framework for multiplication • Each new gladier repository lowers the entry barrier of the next one. • New experiments leveraging this work will allow us to scale the capabilities across the APS and other facilities. • Expertise and capabilities from this project allowed the team to play key roles in DOE’s COVID-19 response, and the new National Virtual Biotechnology Lab (NVBL) • Newly-funded DOE ASCR project (Braid) will allow new modular capabilities to be developed (e.g., Rule-based engine to support continuum computing concepts ) and added to Gladier that allow scientists to more easily access ALCF resources • Gladier permits to integrate with any experimental capability that is python based, i.e. pyEPICS, bluesky, automation.
  • 33. Real life Gladier XPCS in real-time