SlideShare a Scribd company logo
1 of 21
Download to read offline
Rachana Ananthakrishnan
ranantha@uchicago.edu
February 28, 2024
Instrument Data Automation: Life of a
Flow
Instrument data management needs
Cryo EM
Lightsheet
Sequencer
ALS/APS
….
Local system
download
Remote analysis,
visualization
• Reliable, near-real time
data access
• Self-service access
control, management
• Grant data access to
collaborators
• Compute on data
across storage classes
• Do it all at SCALE
Local
policy
store
--/cohort045
--/cohort096
--/cohort127
What is needed for such automation…?
Data
capture
Image analysis
QA check
Threshold
Analysis
Visualize
Metadata
extraction
Publish to
search index
• Bridge across
different facility
resources
• Network as an
instrument
• Use variety of
resources
• Human input
• Credentials for
automation
Proceed or
discard
sample?
XPCS: X-ray Photon Correlation Spectroscopy
ALCF Data
Portal
Argonne Leadership
Computing Facility
APS
Publication
5
Lab Server 1
Acquisition
2
Imaging
1
Plot results
4
XPCS-Eigen
3
Science!
6
● Automate flows stage
data to ALCF for on-
demand analysis and
publication
● Metadata and plots
dynamically extracted,
and published into a
search catalog
● Scientists can select
datasets and initiate
flows to perform batch
analysis tasks
Suresh Narayanan, Nicholas Schwarz
Eagle Storage
Globus
Flows
End-to-end Automation: XPCS
Data capture
Data publication
Transfer
Transfer
IMM
Transfer
Move results
to repo
Compute
Run Corr
Compute
Plot results
Compute
Gather
metadata
Share
Set access
controls
Search
Ingest to
index
Transfer
Transfer
HDF5 files
XPCS flow: definition
7
XPCS Flow
XPCS: Integrating experiment and compute facility
8
Reprocessing of data
Experiment-time processing of data
Argonne: Ian Foster, Mike Papka,
Tom Uram, Christine Simpson, Bill
Allcock, Benoit Cote, Ryan Chard
APS: Suresh Narayanan, Miaoqi
Chu, Hannah Parraga, Nicholas
Schwarz, Laurent Chapon
UChicago: Rachana
Ananthakrishnan, Kyle Chard,
Nickolaus Saint, Ben Blaiszik
One-time configuration per beamline
APS
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
• Create Globus application credential
for the software at the instrument
facility
• Register the compute function(s)
needed for analysis
• Configure the flow such that service
account can run the flow
• Guest collection on Globus Connect
Personal (Windows machine), with
read permissions for service account
to read data
XPCS flow: permissions
10
XPCS flow
permission
One-time configuration per beamline
ALCF
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
• Create a local account at the
compute facility to allow
automated processing
• Install Globus Compute endpoint
in the local account, using the
Globus service account
• Set appropriate local account
policy to manage the compute
endpoint deployment
Beamline and
experiment ID
One-time configuration per beamline
Automated workflow during experiments
Data acquisition
ALCF
APS
ALCF
APS
APS DM
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
Beamline and
experiment ID
One-time configuration per beamline
Automated workflow during experiments
env/
$> …
Beamline account
Data acquisition
Eagle Compute endpoint
ALCF
APS
ALCF
APS
Polaris
APS DM
APS DM
import …
def …
APS Beamline
service account
Compute function
Automating for experiment-time processing
Authorized APS admins with
ALCF account allowed to
manage the endpoint, and
analysis code
XPCS – Reprocessing of data
14
• Flow triggered by
the user via portal
• A separate
application
credential is used
to run the flow
• Data shared with
researcher(s) using
Globus
XPCS portal
15
XPCS portal
Scaling to several beamlines
Flows can be used
beyond instruments..
17
CityCOVID
• Integrated COVID-19 pandemic
monitoring, modeling, and analysis
capability
• CityCOVID is a city-scale agent-
based model
• Steps:
– Scrape daily Chicago reports
– Perform simulations at Argonne
Leadership Computing Facility
– Postprocess data at Lab Computing
Resource Center
Jonathan Ozik, Nick Collier, and
Charles Macal
CityCOVID
funcX
Analyze
Transfer
Publish
Auth
Get
credentials
funcX
Scrape
funcX
Simulate
Transfer
Transfer
data
Materials Data Facility
> 40 TB of data
> 320 published
authors
> 400 datasets
• Accept data from many
locations with flexible
interfaces
• Index dataset contents in
science-aware ways
• Dispatch data to the
community
• Using Automate to
simplify building
composable flows of
services
MDF Data Publication Automation
Ingest
Bulk
Ingest
Auth
Get
Credentials
Automate
Transfer
Transfer
Dataset
XTract
Extract
Metadata
Share
Set
permissions
Transfer
Move
metadata
Transfer
Transfer
Dataset
Transfers
Transfer
Dataset
Identifier
Mint DOI
Web form
Metadata
Notify
Notify
Curator
Web form
Curation
Notify
Notify
user
Support resources
• Globus documentation: docs.globus.org
• YouTube channel: youtube.com/GlobusOnline
• Helpdesk: support@globus.org
• Mailing Lists: globus.org/mailing-lists
• Customer engagement team (office hours)
• Professional services team (advisory, custom work)

More Related Content

Similar to Instrument Data Automation: The Life of a Flow

BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCChris Bunch
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesAmazon Web Services
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...confluent
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisBrendan Gregg
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasDataWorks Summit
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingStreamNative
 

Similar to Instrument Data Automation: The Life of a Flow (20)

BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
Army's Cyber Defense Operations: Building the Right Solutions for the Data Su...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Cloud Migration
Cloud MigrationCloud Migration
Cloud Migration
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Monitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance AnalysisMonitorama 2015 Netflix Instance Analysis
Monitorama 2015 Netflix Instance Analysis
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
End-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and AtlasEnd-to-end Data Governance with Apache Avro and Atlas
End-to-end Data Governance with Apache Avro and Atlas
 
Apache edgent
Apache edgentApache edgent
Apache edgent
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 

More from Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 

More from Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 

Recently uploaded

eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 

Recently uploaded (20)

eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 

Instrument Data Automation: The Life of a Flow

  • 1. Rachana Ananthakrishnan ranantha@uchicago.edu February 28, 2024 Instrument Data Automation: Life of a Flow
  • 2. Instrument data management needs Cryo EM Lightsheet Sequencer ALS/APS …. Local system download Remote analysis, visualization • Reliable, near-real time data access • Self-service access control, management • Grant data access to collaborators • Compute on data across storage classes • Do it all at SCALE Local policy store --/cohort045 --/cohort096 --/cohort127
  • 3. What is needed for such automation…? Data capture Image analysis QA check Threshold Analysis Visualize Metadata extraction Publish to search index • Bridge across different facility resources • Network as an instrument • Use variety of resources • Human input • Credentials for automation Proceed or discard sample?
  • 4. XPCS: X-ray Photon Correlation Spectroscopy ALCF Data Portal Argonne Leadership Computing Facility APS Publication 5 Lab Server 1 Acquisition 2 Imaging 1 Plot results 4 XPCS-Eigen 3 Science! 6 ● Automate flows stage data to ALCF for on- demand analysis and publication ● Metadata and plots dynamically extracted, and published into a search catalog ● Scientists can select datasets and initiate flows to perform batch analysis tasks Suresh Narayanan, Nicholas Schwarz Eagle Storage
  • 5. Globus Flows End-to-end Automation: XPCS Data capture Data publication Transfer Transfer IMM Transfer Move results to repo Compute Run Corr Compute Plot results Compute Gather metadata Share Set access controls Search Ingest to index Transfer Transfer HDF5 files
  • 7. XPCS: Integrating experiment and compute facility 8 Reprocessing of data Experiment-time processing of data Argonne: Ian Foster, Mike Papka, Tom Uram, Christine Simpson, Bill Allcock, Benoit Cote, Ryan Chard APS: Suresh Narayanan, Miaoqi Chu, Hannah Parraga, Nicholas Schwarz, Laurent Chapon UChicago: Rachana Ananthakrishnan, Kyle Chard, Nickolaus Saint, Ben Blaiszik
  • 8. One-time configuration per beamline APS APS DM import … def … APS Beamline service account Compute function Automating for experiment-time processing • Create Globus application credential for the software at the instrument facility • Register the compute function(s) needed for analysis • Configure the flow such that service account can run the flow • Guest collection on Globus Connect Personal (Windows machine), with read permissions for service account to read data
  • 10. One-time configuration per beamline ALCF Automating for experiment-time processing Authorized APS admins with ALCF account allowed to manage the endpoint, and analysis code • Create a local account at the compute facility to allow automated processing • Install Globus Compute endpoint in the local account, using the Globus service account • Set appropriate local account policy to manage the compute endpoint deployment
  • 11. Beamline and experiment ID One-time configuration per beamline Automated workflow during experiments Data acquisition ALCF APS ALCF APS APS DM APS DM import … def … APS Beamline service account Compute function Automating for experiment-time processing Authorized APS admins with ALCF account allowed to manage the endpoint, and analysis code
  • 12. Beamline and experiment ID One-time configuration per beamline Automated workflow during experiments env/ $> … Beamline account Data acquisition Eagle Compute endpoint ALCF APS ALCF APS Polaris APS DM APS DM import … def … APS Beamline service account Compute function Automating for experiment-time processing Authorized APS admins with ALCF account allowed to manage the endpoint, and analysis code
  • 13. XPCS – Reprocessing of data 14 • Flow triggered by the user via portal • A separate application credential is used to run the flow • Data shared with researcher(s) using Globus
  • 15. Scaling to several beamlines
  • 16. Flows can be used beyond instruments.. 17
  • 17. CityCOVID • Integrated COVID-19 pandemic monitoring, modeling, and analysis capability • CityCOVID is a city-scale agent- based model • Steps: – Scrape daily Chicago reports – Perform simulations at Argonne Leadership Computing Facility – Postprocess data at Lab Computing Resource Center Jonathan Ozik, Nick Collier, and Charles Macal
  • 19. Materials Data Facility > 40 TB of data > 320 published authors > 400 datasets • Accept data from many locations with flexible interfaces • Index dataset contents in science-aware ways • Dispatch data to the community • Using Automate to simplify building composable flows of services
  • 20. MDF Data Publication Automation Ingest Bulk Ingest Auth Get Credentials Automate Transfer Transfer Dataset XTract Extract Metadata Share Set permissions Transfer Move metadata Transfer Transfer Dataset Transfers Transfer Dataset Identifier Mint DOI Web form Metadata Notify Notify Curator Web form Curation Notify Notify user
  • 21. Support resources • Globus documentation: docs.globus.org • YouTube channel: youtube.com/GlobusOnline • Helpdesk: support@globus.org • Mailing Lists: globus.org/mailing-lists • Customer engagement team (office hours) • Professional services team (advisory, custom work)