Science for the Future: Strategies for Moving and Sharing Data
1. globus online
Science for the Future
Strategies for
distributing and sharing data
www.globusonline.org
Ian Foster
foster@anl.gov
2. Big science data should be easy
Registry
Staging
Store
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Registry
3. … but it’s hard and frustrating!
Registry
Staging
Store
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Registry
Quota
exceeded
!
Expired
credentials
!
Network
failed. Retry.
!
Permission
denied
!
4. Excerpts from ESNet reports
• “Transfers often take longer than expected
based on available network capacities”
• “Lack of an easy to use interface to some of the
high-performance tools”
• “Tools [are] too difficult to install and use”
• “Time and interruption to other work required to
supervise large data transfers”
• “Need data transfer tools that are easy to use,
well-supported, and permitted by site and facility
cybersecurity organizations”
5. We envisage a world where data …
… flows rapidly, reliably, and securely
among:
experimental facilities,
online and archival storage,
computing facilities, and
remote institutions
6. We envisage a world where data …
… is easily integrated into dynamic
datasets that also include metadata
and programs necessary to understand
and regenerate it
7. We envisage a world where data …
… is readily discoverable and
accessible to collaborators, regardless
of their and the data’s location
8. We believe a new approach is
needed to deliver data
management infrastructure
Frictionless
Affordable
Sustainable
Like … but for science!
9. Focusing on “frictionless”, we’ve started to do
this with the Globus Online service …
Transfer and sharing of
large data sets …
… with dropbox-like
characteristics …
… directly from your own
storage systems
10. We started with reliable, secure,
high-performance file transfer …
Data
Source
Data
Destination
User initiates
transfer request
1
Globus Online
moves and
syncs files
2
Globus Online
notifies user
3
11. … and then made it simple to share
big data off existing storage systems
Data
Source
User A selects
file(s) to share,
selects user or
group, and sets
permissions
1
Globus Online tracks
shared files; no need
to move files to cloud
storage!
2
User B logs in to
Globus Online and
accesses shared file
3
17. Exemplar: APS Beamline 2-BM
X-Ray imaging, tomography, ~few µm to
30nm resolution
Currently can generate
>100TB per day
<1GB/s data rate; ~3-
5GB/s in 5-10 years
18. Transforming data acquisition
Current
• Experimental parameters
optimized manually
• Collected data combined
with visual inspection to
confirm optimal condition
• Data reconstructed and sent
to users via external drive
• User team starts data
reduction at home institution
19. Transforming data acquisition
Envisaged
• Experimental parameters
optimized automatically
• Collected data available to
optimization programs
• Data are automatically
reconstructed, reduced,
and shared with local and
remote participants
• User team leaves the APS
with reduced data
Current
• Experimental parameters
optimized manually
• Collected data combined
with visual inspection to
confirm optimal condition
• Data reconstructed and sent
to users via external drive
• User team starts data
reduction at home institution
20. Facility data
acquisition
Globus Online as enabler
Globus Online
transfer service
Reduced
data
Analysis/Shar
ingGlobus Online
sharing service
Globus Online
dataset service*
* In development
21. 21Credit: Kerstin Kleese-van Dam
Erin Miller (PNNL)
collects data at
Advanced Photon
Source, renders at
PNNL, and views at
ANL
22. We believe a new approach is
needed to deliver data
management infrastructure
Frictionless
Affordable
Sustainable
23. We’ve got a handle on “frictionless”
• Web interface, REST API, command line
• InCommon, Oauth, OpenID, X.509, …
• Credential management
• Group definition and management
• Transfer management and optimization
• Reliability via transfer retries
• Integration with ESNet “Science DMZs”
• One-click “Globus Connect” install
• 5-minute Globus Connect Multi User install
24. “Affordable” and “sustainable”?
Common expectation is either:
– High-priced commercial software
(with generally higher levels of quality)
Or:
– Free, open source software
(with generally lower levels of quality)
We aim to offer the best of all worlds!
25. We are a non-profit service
provider to the non-profit
research community
27. Starting at $20k per year
• Managed endpoints with sharing
• Multiple GridFTP servers per endpoint
• Branded web sites
• Alternate identity provider
• Usage reporting
• Mass storage system (MSS) optimizations
• Operations monitoring and management
• Input into and access to product roadmap
Globus Online Provider Plans
28. Provider Plan not required to get started
Use Globus Connect Multiuser to easily
connect your resources with Globus
Go to: globusonline.org/gcmu
Registry
Staging
Store
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Ingest
Store
Analysis
Store
Community
Store
Archive Mirror
Registry
30. Providers are also using Globus
Online as a platform
Globus Nexus
(Identity, Group, Profile)
…
Sharing Service
Transfer Service
Dataset Services
Globus Toolkit
GlobusOnlineAPIs
GlobusConnect
This image shows a 3D rendering of a Shewanella biofilm grown on a flat plastic substrate in a Constant Depth bioFilm Fermenter (CDFF). The image was generated using x-ray microtomography at the Advanced Photon Source, Argonne National Laboratory.
Mention plans start from 20K and that we are talking to NET+ for campus plans with separate pricing?