A presentation at the First National Research Platform workshop. "The purpose of this workshop is to bring together representatives from interested institutions to discuss implementation strategies for deployment of interoperable Science DMZs at a national scale." I present eight desirable properties for a software infrastructure for such a platforms, and describe our experience realizing these properties in the Globus system.
Software Infrastructure for a National Research Platform
1. Ian Foster
The University of Chicago
Argonne National Laboratory
Talk at 1st National Research Platform Workshop
Aug 7-8, 2017
Bozeman, Montana
Software infrastructure for a
National Research Platform
2. globus.org
Congratulations, you have a Science DMZ!
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal Server
Browsing path
Query path
Portal server applications:
· web server
· search
· database
· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
2Credit: Eli Dart
3. globus.org
What you really want is a science accelerator
Software
Infrastructure
Software transmutes silicon into discoveries
High-speed data ingest
Secure data sharing
Data publication
Smart instruments
Ultra-scale collaboration
4. globus.org
A strong software infrastructure is…
Accessible — trivially usable by all
Ubiquitous — it goes where you need it
Performant — fast end to end
Secure — all resources are protected
Reliable — you can count on it
Programmable — you can build on it
Manageable — it supports sys admins, too
Sustainable — it will be there tomorrow
4
5. globus.org
Accessible means trivially usable by all
5
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Curator reviews and
approves; dataset
published on campus
or other system
7
Researcher
selects files to
share, selects user
or group, and sets
access permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Researcher assembles
dataset; describes it
with Dublin core &
domain-specific
metadaa
6
6
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
8
Publication
Repository
Personal Computer
Transfer
Share
Publish
Discover
• Access via web
browser, command
line, or REST API
• Use any storage
• Use existing identity
6. globus.org
Ubiquitous means it goes where you need it
6
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
10,000+ active endpoints
Native packages
Installs in seconds
Linux, Windows, MacOS
GPFS, Lustre, OrangeFS, …
AWS S3, Ceph RadosGW
Spectra Logic BlackPearl
Google Drive, HPSS
Amazon
Glacier
7. globus.org
Performant means fast end to end
7
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Specialized protocols
Auto-configuration
Parallel DTNs
File system optimizations
Tape system optimizations
1PB in 1.002 days, ArgonneNCSA
R. Kettimuthu et al.
8. globus.org
Secure means all resources are protected
8
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Globus service is itself highly secure
Best-practice cloud security
Third-party security reviews
Globus platform ensures your services are secure
Accept credentials from 300+ identity providers
Control proxy credential lifetimes
Industry-standard OAuth-2 and OIDC protocols
Data encryption
Build secure services with controlled delegation
9. globus.org
Reliable means you can count on it
9
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Each transfer is monitored,
retried upon failure
Protocols support restart
Fail over on multiple DTNs
Service is cloud hosted,
with replication, dynamic
failover, monitoring
99.5% uptime over past
three years
10. globus.org
Programmable means you can build on it
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Globus Auth API
…
GlobusTransferAPI
GlobusConnect
Data Publication &
Discovery
File Sharing
File Transfer & Replication
Use institutional ID
systems in external
web applications
Integrate file transfer
and sharing capabilities
into scientific web apps,
portals, gateways, etc.
GET /endpoint/go%23ep1
PUT /endpoint/vas#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
…
Web
Command line
REST API
11. globus.org
Programmable means you can build on it
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Globus Auth API
…
GlobusTransferAPI
GlobusConnect
Data Publication &
Discovery
File Sharing
File Transfer & Replication
Use institutional ID
systems in external
web applications
Integrate file transfer
and sharing capabilities
into scientific web apps,
portals, gateways, etc.
Python SDK
Jupyter Notebooks
12. Programmable means automation
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
Data distribution
.../my_share
--/cohort045
--/cohort096
--/cohort127
Shared
Endpoint
Staging area
cleanup
Shared
Endpoint
1. Check if successful transfer
2. Delete data from staging area
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
globus.org
14. globus.org
Manageable means it helps sys admins, too
14
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Low admin costs
Priority support
Usage reporting
Management
console
Alternative identity
provider
Training materials
Constant innovation
15. globus.org
Sustainable means it will be there tomorrow
15
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Operated by professionals at the University of Chicago
Supported by subscriptions from >65 institutions
16. globus.org
Raising the bar on research software quality
5
major services
13
national labs
use Globus
290PB
transferred
10,000
active endpoints
50 Bn
files processed
70,000
registered users
99.5%
uptime
65+
institutional
subscribers
1 PB
largest single
transfer to date
3 months
longest
continuously
managed transfer
300+
federated
campus identities
12,000
active users/year
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable