SlideShare a Scribd company logo
1 of 21
Download to read offline
Rachana Ananthakrishnan
ranantha@uchicago.edu
February 28, 2024
Instrument Data Automation: Life of a
Flow
Instrument data management needs
Cryo EM
Lightsheet
Sequencer
ALS/APS
….
Local system
download
Remote analysis,
visualization
• Reliable, near-real time
data access
• Self-service access
control, management
• Grant data access to
collaborators
• Compute on data
across storage classes
• Do it all at SCALE
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
What is needed for such automation…?
Data
capture
Image analysis
QA check
Threshold
Analysis
Visualize
Metadata
extraction
Publish to
search index
• Bridge across
different facility
resources
• Network as an
instrument
• Use variety of
resources
• Human input
• Credentials for
automation
Proceed or
discard
sample?
XPCS: X-ray Photon Correlation Spectroscopy
ALCF Data
Portal
Argonne Leadership
Computing Facility
APS
Publication
5
Lab Server 1
Acquisition
2
Imaging
1
Plot results
4
XPCS-Eigen
3
Science!
6
● Automate flows stage
data to ALCF for on-
demand analysis and
publication
● Metadata and plots
dynamically extracted,
and published into a
search catalog
● Scientists can select
datasets and initiate
flows to perform batch
analysis tasks
Suresh Narayanan, Nicholas Schwarz
Eagle Storage
Globus
Flows
End-to-end Automation: XPCS
Data capture
Data publication
Transfer
Transfer
IMM
Transfer
Move results
to repo
Compute
Run Corr
Compute
Plot results
Compute
Gather
metadata
Share
Set access
controls
Search
Ingest to
index
Transfer
Transfer
HDF5 files
XPCS flow: definition
7
XPCS Flow
XPCS: Integrating experiment and compute facility
8
Reprocessing of data
Experiment-time processing of data
Argonne: Ian Foster, Mike Papka,
Tom Uram, Christine Simpson, Bill
Allcock, Benoit Cote, Ryan Chard
APS: Suresh Narayanan, Miaoqi
Chu, Hannah Parraga, Nicholas
Schwarz, Laurent Chapon
UChicago: Rachana
Ananthakrishnan, Kyle Chard,
Nickolaus Saint, Ben Blaiszik
One-time configuration per beamline
APS
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
• Create Globus application credential
for the software at the instrument
facility
• Register the compute function(s)
needed for analysis
• Configure the flow such that service
account can run the flow
• Guest collection on Globus Connect
Personal (Windows machine), with
read permissions for service account
to read data
XPCS flow: permissions
10
XPCS flow
permission
One-time configuration per beamline
ALCF
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
• Create a local account at the
compute facility to allow
automated processing
• Install Globus Compute endpoint
in the local account, using the
Globus service account
• Set appropriate local account
policy to manage the compute
endpoint deployment
Beamline and
experiment ID
One-time configuration per beamline
Automated workflow during experiments
Data acquisition
ALCF
APS
ALCF
APS
APS DM
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
Beamline and
experiment ID
One-time configuration per beamline
Automated workflow during experiments
env/
$> …
Beamline account
Data acquisition
Eagle Compute endpoint
ALCF
APS
ALCF
APS
Polaris
APS DM
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
XPCS – Reprocessing of data
14
• Flow triggered by
the user via portal
• A separate
application
credential is used
to run the flow
• Data shared with
researcher(s) using
Globus
XPCS portal
15
XPCS portal
Scaling to several beamlines
Flows can be used
beyond instruments..
17
CityCOVID
• Integrated COVID-19 pandemic
monitoring, modeling, and analysis
capability
• CityCOVID is a city-scale agent-
based model
• Steps:
– Scrape daily Chicago reports
– Perform simulations at Argonne
Leadership Computing Facility
– Postprocess data at Lab Computing
Resource Center
Jonathan Ozik, Nick Collier, and
Charles Macal
CityCOVID
funcX
Analyze
Transfer
Publish
Auth
Get
credentials
funcX
Scrape
funcX
Simulate
Transfer
Transfer
data
Materials Data Facility
> 40 TB of data
> 320 published
authors
> 400 datasets
• Accept data from many
locations with flexible
interfaces
• Index dataset contents in
science-aware ways
• Dispatch data to the
community
• Using Automate to
simplify building
composable flows of
services
MDF Data Publication Automation
Ingest
Bulk
Ingest
Auth
Get
Credentials
Automate
Transfer
Transfer
Dataset
XTract
Extract
Metadata
Share
Set
permissions
Transfer
Move
metadata
Transfer
Transfer
Dataset
Transfers
Transfer
Dataset
Identifier
Mint DOI
Web form
Metadata
Notify
Notify
Curator
Web form
Curation
Notify
Notify
user
Support resources
• Globus documentation: docs.globus.org
• YouTube channel: youtube.com/GlobusOnline
• Helpdesk: support@globus.org
• Mailing Lists: globus.org/mailing-lists
• Customer engagement team (office hours)
• Professional services team (advisory, custom work)

More Related Content

Similar to Instrument Data Automation: The Life of a Flow

End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
DataWorks Summit
 

Similar to Instrument Data Automation: The Life of a Flow (20)

BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Apache edgent
Apache edgentApache edgent
Apache edgent
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 

More from Globus

Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
Globus
 

More from Globus (20)

Reactive Documents and Computational Pipelines
Reactive Documents and Computational PipelinesReactive Documents and Computational Pipelines
Reactive Documents and Computational Pipelines
 
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data AnalysisProviding Globus Services to Users Of JASMIN for Environmental Data Analysis
Providing Globus Services to Users Of JASMIN for Environmental Data Analysis
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
Innovating Inference: Remote Triggering of Large Language Models on HPC Clust...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
GlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote AddressGlobusWorld 2024: Opening Keynote Address
GlobusWorld 2024: Opening Keynote Address
 
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use CasesGlobus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
Globus Connect Server Deep Dive - Advanced Configuration Options and Use Cases
 
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) WorkflowsGlobus Compute with Integrated Research Infrastructure (IRI) Workflows
Globus Compute with Integrated Research Infrastructure (IRI) Workflows
 
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
Exploring Innovations in Data Repository Solutions Insights from the U.S. Geo...
 
Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)Globus at the U.S. Geological Survey (USGS)
Globus at the U.S. Geological Survey (USGS)
 
Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)Globus and the Integrated Research Infrastructure (IRI)
Globus and the Integrated Research Infrastructure (IRI)
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Extending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data InfrastructureExtending Globus into a Site-wide Automated Data Infrastructure
Extending Globus into a Site-wide Automated Data Infrastructure
 
Enhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptxEnhancing Research Orchestration Capabilities at ORNL.pptx
Enhancing Research Orchestration Capabilities at ORNL.pptx
 
Enhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdfEnhancing Performance with Globus and the Science DMZ.pdf
Enhancing Performance with Globus and the Science DMZ.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
Climate Science Flows Enabling Petabyte-Scale Climate Analysis with the Earth...
 
Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024Introduction to Globus Compute - GlobusWorld 2024
Introduction to Globus Compute - GlobusWorld 2024
 
Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 

Recently uploaded

Recently uploaded (20)

WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Novo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMsNovo Nordisk: When Knowledge Graphs meet LLMs
Novo Nordisk: When Knowledge Graphs meet LLMs
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

Instrument Data Automation: The Life of a Flow

  • 1. Rachana Ananthakrishnan ranantha@uchicago.edu February 28, 2024 Instrument Data Automation: Life of a Flow
  • 2. Instrument data management needs Cryo EM Lightsheet Sequencer ALS/APS …. Local system download Remote analysis, visualization • Reliable, near-real time data access • Self-service access control, management • Grant data access to collaborators • Compute on data across storage classes • Do it all at SCALE Local policy store --/cohort045 --/cohort096 --/cohort127
  • 3. What is needed for such automation…? Data capture Image analysis QA check Threshold Analysis Visualize Metadata extraction Publish to search index • Bridge across different facility resources • Network as an instrument • Use variety of resources • Human input • Credentials for automation Proceed or discard sample?
  • 4. XPCS: X-ray Photon Correlation Spectroscopy ALCF Data Portal Argonne Leadership Computing Facility APS Publication 5 Lab Server 1 Acquisition 2 Imaging 1 Plot results 4 XPCS-Eigen 3 Science! 6 ● Automate flows stage data to ALCF for on- demand analysis and publication ● Metadata and plots dynamically extracted, and published into a search catalog ● Scientists can select datasets and initiate flows to perform batch analysis tasks Suresh Narayanan, Nicholas Schwarz Eagle Storage
  • 5. Globus Flows End-to-end Automation: XPCS Data capture Data publication Transfer Transfer IMM Transfer Move results to repo Compute Run Corr Compute Plot results Compute Gather metadata Share Set access controls Search Ingest to index Transfer Transfer HDF5 files
  • 7. XPCS: Integrating experiment and compute facility 8 Reprocessing of data Experiment-time processing of data Argonne: Ian Foster, Mike Papka, Tom Uram, Christine Simpson, Bill Allcock, Benoit Cote, Ryan Chard APS: Suresh Narayanan, Miaoqi Chu, Hannah Parraga, Nicholas Schwarz, Laurent Chapon UChicago: Rachana Ananthakrishnan, Kyle Chard, Nickolaus Saint, Ben Blaiszik
  • 8. One-time configuration per beamline APS APS DM import … def … APS Beamline service account Compute function Automating for experiment-time processing • Create Globus application credential for the software at the instrument facility • Register the compute function(s) needed for analysis • Configure the flow such that service account can run the flow • Guest collection on Globus Connect Personal (Windows machine), with read permissions for service account to read data
  • 10. One-time configuration per beamline ALCF Automating for experiment-time processing Authorized APS admins with ALCF account allowed to manage the endpoint, and analysis code • Create a local account at the compute facility to allow automated processing • Install Globus Compute endpoint in the local account, using the Globus service account • Set appropriate local account policy to manage the compute endpoint deployment
  • 11. Beamline and experiment ID One-time configuration per beamline Automated workflow during experiments Data acquisition ALCF APS ALCF APS APS DM APS DM import … def … APS Beamline service account Compute function Automating for experiment-time processing Authorized APS admins with ALCF account allowed to manage the endpoint, and analysis code
  • 12. Beamline and experiment ID One-time configuration per beamline Automated workflow during experiments env/ $> … Beamline account Data acquisition Eagle Compute endpoint ALCF APS ALCF APS Polaris APS DM APS DM import … def … APS Beamline service account Compute function Automating for experiment-time processing Authorized APS admins with ALCF account allowed to manage the endpoint, and analysis code
  • 13. XPCS – Reprocessing of data 14 • Flow triggered by the user via portal • A separate application credential is used to run the flow • Data shared with researcher(s) using Globus
  • 15. Scaling to several beamlines
  • 16. Flows can be used beyond instruments.. 17
  • 17. CityCOVID • Integrated COVID-19 pandemic monitoring, modeling, and analysis capability • CityCOVID is a city-scale agent- based model • Steps: – Scrape daily Chicago reports – Perform simulations at Argonne Leadership Computing Facility – Postprocess data at Lab Computing Resource Center Jonathan Ozik, Nick Collier, and Charles Macal
  • 19. Materials Data Facility > 40 TB of data > 320 published authors > 400 datasets • Accept data from many locations with flexible interfaces • Index dataset contents in science-aware ways • Dispatch data to the community • Using Automate to simplify building composable flows of services
  • 20. MDF Data Publication Automation Ingest Bulk Ingest Auth Get Credentials Automate Transfer Transfer Dataset XTract Extract Metadata Share Set permissions Transfer Move metadata Transfer Transfer Dataset Transfers Transfer Dataset Identifier Mint DOI Web form Metadata Notify Notify Curator Web form Curation Notify Notify user
  • 21. Support resources • Globus documentation: docs.globus.org • YouTube channel: youtube.com/GlobusOnline • Helpdesk: support@globus.org • Mailing Lists: globus.org/mailing-lists • Customer engagement team (office hours) • Professional services team (advisory, custom work)