SlideShare a Scribd company logo
1 of 23
globusonline
Globus Online for Managing
Tomography Data at APS
Rachana Ananthakrishnan
Francesco De Carlo
Argonne National Lab
We started with reliable, secure,
high-performance file transfer …
Data
Source
Data
Destination
User initiates
transfer request
1
Globus Online
moves and
syncs files
2
Globus Online
notifies user
3
… and then made it simple to share
big data off existing storage systems
Data
Source
User A selects
file(s) to
share, selects user
or group, and sets
permissions
1
Globus Online tracks
shared files; no need
to move files to cloud
storage!
2
User B logs in to
Globus Online
and accesses
shared file
3
Transforming data acquisition
Current
• Experimental parameters
optimized manually
• Collected data combined
with visual inspection to
confirm optimal condition
• Data reconstructed and sent
to users via external drive
• User team starts data
reduction at home institution
Transforming data acquisition
Envisaged
• Experimental parameters
optimized automatically
• Collected data available to
optimization programs
• Data are automatically
reconstructed, reduced, an
d shared with local and
remote participants
• User team leaves the APS
with reduced data
Current
• Experimental parameters
optimized manually
• Collected data combined
with visual inspection to
confirm optimal condition
• Data reconstructed and sent
to users via external drive
• User team starts data
reduction at home institution
Facility data
acquisition
Globus Online as enabler
Globus Online
transfer service
Reduced
data
Analysis/Shar
ingGlobus Online
sharing service
Globus Online
dataset service*
* In development
7Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL)
collects data at
Advanced Photon
Source, renders at
PNNL, and views at
ANL
Looking at how researchers use data
• A single research question often requires the
integration of many data elements, that are:
– In different locations
– In different formats (Excel, text, CDF, HDF, …)
– Described in different ways
• Best grouping can vary during investigation
– Longitudinal, vertical, cross-cutting
• But always needs to be operated on as a unit
– Share, annotate, process, copy, archive, …
How do we manage data today?
• Often, a curious mix of ad hoc methods
– Organize in directories using file and directory
naming conventions
– Capture status in README
files, spreadsheets, notebooks
– Even PowerPoint!
• Time-consuming, complex, error prone
Why can’t we manage our data like we
manage our pictures and music?
Introducing the dataset
• Group data based on use, not location
– Logical grouping to organize, reorganize, search, and
describe usage
• Tagwith characteristics that reflect content …
– Capture as much existing information as we can
• …or to reflect current status in investigation
– Stage of processing, provenance, validation, ..
• Sharedata sets for collaboration
– Control access to data and metadata
• Operateon datasets as units
– Copy, export, analyze, tag, archive, …
Expanding Globus Online services
• Ingest and publication
– Imagine a DropBox that not only replicates, but
also extracts metadata, catalogs, converts
• Cataloging
– Virtual views of data based on user-defined
and/or automatically extracted metadata
• Integration with computation
– Associate computational
procedures, orchestrate application, catalog
results, record provenance
Builds on catalog as a service
Approach
• Hosted user-defined
catalogs
• Based on tag model
<subject, name, value>
• Optional schema
constraints
• Integrated with other
Globus services
Three REST APIs
/query/
• Retrieve subjects
/tags/
• Create, delete, retrieve
tags
/tagdef/
• Create, delete, retrieve
tag definitions
Builds on USC Tagfiler project (C. Kesselman et al.)
Exemplar: APS Beamlines 32-ID & 2-BM
X-Ray imaging, tomography, ~few µm to
30 nm resolution
Currently can generate up
to 100 TB per day
< 1GB/s data rate; ~3-
5GB/s in 5-10 years
Storage
Image processing
(normalization, etc.)
Tomographic
reconstruction
Visual inspection
Selection
Beamline 2-BM
~1.5um resolution
Beamline 32-ID-C
20-50 nm resolution
Image processing
(alignment, etc.)
Tomographic
reconstruction
Visual inspection
Selection
Selection
Multi-scale
image fusion
Visual inspection
Up to 100 fps
2K x 2K, 16 bits
11 GB raw data
1,500 fps
2K x 2K, 16 bits
1 min readout
11 GB raw data
Multi-scale 3D
imaging data
fusion at APS
15
APS Imaging Group
APS Software Service Group
Mathematics & Computer Science/Computation Institute
Multi-scale image
fusion
Infrastructure LDRD
System integration
Instrument & Data
Collection
Data Management Services
Mathematics &
Computer Science
Results:
Google earth style
zoom in data
navigation
Tao of Fusion LDRD
Argonne Collaborations
Timelines
• July:
– Alpha service available
• August:
– Pilot with two groups at APS
• Fall of this year:
– Pilot with few other groups at APS
– Early beta
Thank You
• Interested in working with us on dataset
service:
– Email: ranantha@mcs.anl.gov
• Contact: support@globusonline.org
• Website: www.globusonline.org

More Related Content

Similar to 2013 06-21-computing-for-light-sources

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
Ian Foster
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
Kirill Osipov
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 

Similar to 2013 06-21-computing-for-light-sources (20)

Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Globus status and publication plans
Globus status and publication plansGlobus status and publication plans
Globus status and publication plans
 
Science for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing DataScience for the Future: Strategies for Moving and Sharing Data
Science for the Future: Strategies for Moving and Sharing Data
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Intelligent Cloud Enablement
Intelligent Cloud EnablementIntelligent Cloud Enablement
Intelligent Cloud Enablement
 
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSExperiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS
 
CPAC Connectome Analysis in the Cloud
CPAC Connectome Analysis in the CloudCPAC Connectome Analysis in the Cloud
CPAC Connectome Analysis in the Cloud
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
UWP apps development - Part 3
UWP apps development - Part 3UWP apps development - Part 3
UWP apps development - Part 3
 
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
Globus Genomics: How Science-as-a-Service is Accelerating Discovery (BDT310) ...
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

2013 06-21-computing-for-light-sources

  • 1. globusonline Globus Online for Managing Tomography Data at APS Rachana Ananthakrishnan Francesco De Carlo Argonne National Lab
  • 2. We started with reliable, secure, high-performance file transfer … Data Source Data Destination User initiates transfer request 1 Globus Online moves and syncs files 2 Globus Online notifies user 3
  • 3. … and then made it simple to share big data off existing storage systems Data Source User A selects file(s) to share, selects user or group, and sets permissions 1 Globus Online tracks shared files; no need to move files to cloud storage! 2 User B logs in to Globus Online and accesses shared file 3
  • 4. Transforming data acquisition Current • Experimental parameters optimized manually • Collected data combined with visual inspection to confirm optimal condition • Data reconstructed and sent to users via external drive • User team starts data reduction at home institution
  • 5. Transforming data acquisition Envisaged • Experimental parameters optimized automatically • Collected data available to optimization programs • Data are automatically reconstructed, reduced, an d shared with local and remote participants • User team leaves the APS with reduced data Current • Experimental parameters optimized manually • Collected data combined with visual inspection to confirm optimal condition • Data reconstructed and sent to users via external drive • User team starts data reduction at home institution
  • 6. Facility data acquisition Globus Online as enabler Globus Online transfer service Reduced data Analysis/Shar ingGlobus Online sharing service Globus Online dataset service* * In development
  • 7. 7Credit: Kerstin Kleese-van Dam Erin Miller (PNNL) collects data at Advanced Photon Source, renders at PNNL, and views at ANL
  • 8. Looking at how researchers use data • A single research question often requires the integration of many data elements, that are: – In different locations – In different formats (Excel, text, CDF, HDF, …) – Described in different ways • Best grouping can vary during investigation – Longitudinal, vertical, cross-cutting • But always needs to be operated on as a unit – Share, annotate, process, copy, archive, …
  • 9. How do we manage data today? • Often, a curious mix of ad hoc methods – Organize in directories using file and directory naming conventions – Capture status in README files, spreadsheets, notebooks – Even PowerPoint! • Time-consuming, complex, error prone Why can’t we manage our data like we manage our pictures and music?
  • 10. Introducing the dataset • Group data based on use, not location – Logical grouping to organize, reorganize, search, and describe usage • Tagwith characteristics that reflect content … – Capture as much existing information as we can • …or to reflect current status in investigation – Stage of processing, provenance, validation, .. • Sharedata sets for collaboration – Control access to data and metadata • Operateon datasets as units – Copy, export, analyze, tag, archive, …
  • 11. Expanding Globus Online services • Ingest and publication – Imagine a DropBox that not only replicates, but also extracts metadata, catalogs, converts • Cataloging – Virtual views of data based on user-defined and/or automatically extracted metadata • Integration with computation – Associate computational procedures, orchestrate application, catalog results, record provenance
  • 12. Builds on catalog as a service Approach • Hosted user-defined catalogs • Based on tag model <subject, name, value> • Optional schema constraints • Integrated with other Globus services Three REST APIs /query/ • Retrieve subjects /tags/ • Create, delete, retrieve tags /tagdef/ • Create, delete, retrieve tag definitions Builds on USC Tagfiler project (C. Kesselman et al.)
  • 13. Exemplar: APS Beamlines 32-ID & 2-BM X-Ray imaging, tomography, ~few µm to 30 nm resolution Currently can generate up to 100 TB per day < 1GB/s data rate; ~3- 5GB/s in 5-10 years
  • 14. Storage Image processing (normalization, etc.) Tomographic reconstruction Visual inspection Selection Beamline 2-BM ~1.5um resolution Beamline 32-ID-C 20-50 nm resolution Image processing (alignment, etc.) Tomographic reconstruction Visual inspection Selection Selection Multi-scale image fusion Visual inspection Up to 100 fps 2K x 2K, 16 bits 11 GB raw data 1,500 fps 2K x 2K, 16 bits 1 min readout 11 GB raw data Multi-scale 3D imaging data fusion at APS
  • 15. 15 APS Imaging Group APS Software Service Group Mathematics & Computer Science/Computation Institute Multi-scale image fusion Infrastructure LDRD System integration Instrument & Data Collection Data Management Services Mathematics & Computer Science Results: Google earth style zoom in data navigation Tao of Fusion LDRD Argonne Collaborations
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Timelines • July: – Alpha service available • August: – Pilot with two groups at APS • Fall of this year: – Pilot with few other groups at APS – Early beta
  • 23. Thank You • Interested in working with us on dataset service: – Email: ranantha@mcs.anl.gov • Contact: support@globusonline.org • Website: www.globusonline.org

Editor's Notes

  1. This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF).  The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.   
  2. http://datasets.globus.org/carl-catalog/query/propertyA=value1