Ian Foster
The University of Chicago
Argonne National Laboratory
Talk at 1st National Research Platform Workshop
Aug 7-8, 2017
Bozeman, Montana
Software infrastructure for a
National Research Platform
globus.org
Congratulations, you have a Science DMZ!
10GE10GE
10GE
10GE
Border Router
WAN
Science DMZ
Switch/Router
Firewall
Enterprise
perfSONAR
perfSONAR
10GE
10GE
10GE
10GE
DTN
DTN
API DTNs
(data access governed by portal)
DTN
DTN
perfSONAR
Filesystem
(data store)
10GE
Portal Server
Browsing path
Query path
Portal server applications:
· web server
· search
· database
· authentication
Data Path
Data Transfer Path
Portal Query/Browse Path
2Credit: Eli Dart
globus.org
What you really want is a science accelerator
Software
Infrastructure
Software transmutes silicon into discoveries
High-speed data ingest
Secure data sharing
Data publication
Smart instruments
Ultra-scale collaboration
globus.org
A strong software infrastructure is…
Accessible — trivially usable by all
Ubiquitous — it goes where you need it
Performant — fast end to end
Secure — all resources are protected
Reliable — you can count on it
Programmable — you can build on it
Manageable — it supports sys admins, too
Sustainable — it will be there tomorrow
4
globus.org
Accessible means trivially usable by all
5
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Curator reviews and
approves; dataset
published on campus
or other system
7
Researcher
selects files to
share, selects user
or group, and sets
access permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Researcher assembles
dataset; describes it
with Dublin core &
domain-specific
metadaa
6
6
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
8
Publication
Repository
Personal Computer
Transfer
Share
Publish
Discover
• Access via web
browser, command
line, or REST API
• Use any storage
• Use existing identity
globus.org
Ubiquitous means it goes where you need it
6
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
10,000+ active endpoints
Native packages
Installs in seconds
Linux, Windows, MacOS
GPFS, Lustre, OrangeFS, …
AWS S3, Ceph RadosGW
Spectra Logic BlackPearl
Google Drive, HPSS
Amazon
Glacier
globus.org
Performant means fast end to end
7
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
 Specialized protocols
 Auto-configuration
 Parallel DTNs
 File system optimizations
 Tape system optimizations
1PB in 1.002 days, ArgonneNCSA
R. Kettimuthu et al.
globus.org
Secure means all resources are protected
8
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Globus service is itself highly secure
 Best-practice cloud security
 Third-party security reviews
Globus platform ensures your services are secure
 Accept credentials from 300+ identity providers
 Control proxy credential lifetimes
 Industry-standard OAuth-2 and OIDC protocols
 Data encryption
 Build secure services with controlled delegation
globus.org
Reliable means you can count on it
9
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Each transfer is monitored,
retried upon failure
Protocols support restart
Fail over on multiple DTNs
Service is cloud hosted,
with replication, dynamic
failover, monitoring
99.5% uptime over past
three years
globus.org
Programmable means you can build on it
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Globus Auth API
…
GlobusTransferAPI
GlobusConnect
Data Publication &
Discovery
File Sharing
File Transfer & Replication
Use institutional ID
systems in external
web applications
Integrate file transfer
and sharing capabilities
into scientific web apps,
portals, gateways, etc.
GET /endpoint/go%23ep1
PUT /endpoint/vas#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
…
Web
Command line
REST API
globus.org
Programmable means you can build on it
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Globus Auth API
…
GlobusTransferAPI
GlobusConnect
Data Publication &
Discovery
File Sharing
File Transfer & Replication
Use institutional ID
systems in external
web applications
Integrate file transfer
and sharing capabilities
into scientific web apps,
portals, gateways, etc.
Python SDK
Jupyter Notebooks
Programmable means automation
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
Data distribution
.../my_share
--/cohort045
--/cohort096
--/cohort127
Shared
Endpoint
Staging area
cleanup
Shared
Endpoint
1. Check if successful transfer
2. Delete data from staging area
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
globus.org
globus.org
Programmable means automation
13
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable ARM Climate Research Facility
globus.org
Manageable means it helps sys admins, too
14
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Low admin costs
Priority support
Usage reporting
Management
console
Alternative identity
provider
Training materials
Constant innovation
globus.org
Sustainable means it will be there tomorrow
15
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
Operated by professionals at the University of Chicago
Supported by subscriptions from >65 institutions
globus.org
Raising the bar on research software quality
5
major services
13
national labs
use Globus
290PB
transferred
10,000
active endpoints
50 Bn
files processed
70,000
registered users
99.5%
uptime
65+
institutional
subscribers
1 PB
largest single
transfer to date
3 months
longest
continuously
managed transfer
300+
federated
campus identities
12,000
active users/year
Accessible
Ubiquitous
Performant
Secure
Reliable
Programmable
Manageable
Sustainable
globus.org
More
Users
Time
Data
Storage
Better
Collaboration
Ideas
Innovation
Easier
Authentication
Transfer
Sharing
Publication
Administration
Software infrastructure for a national research platform
Get more data to more people faster
Software transmutes hardware into discoveries
Thank you to our sponsors!
U . S . D E P A R T M E N T O F
ENERGY 18
Our
subscribers
globus.org

Software Infrastructure for a National Research Platform

  • 1.
    Ian Foster The Universityof Chicago Argonne National Laboratory Talk at 1st National Research Platform Workshop Aug 7-8, 2017 Bozeman, Montana Software infrastructure for a National Research Platform
  • 2.
    globus.org Congratulations, you havea Science DMZ! 10GE10GE 10GE 10GE Border Router WAN Science DMZ Switch/Router Firewall Enterprise perfSONAR perfSONAR 10GE 10GE 10GE 10GE DTN DTN API DTNs (data access governed by portal) DTN DTN perfSONAR Filesystem (data store) 10GE Portal Server Browsing path Query path Portal server applications: · web server · search · database · authentication Data Path Data Transfer Path Portal Query/Browse Path 2Credit: Eli Dart
  • 3.
    globus.org What you reallywant is a science accelerator Software Infrastructure Software transmutes silicon into discoveries High-speed data ingest Secure data sharing Data publication Smart instruments Ultra-scale collaboration
  • 4.
    globus.org A strong softwareinfrastructure is… Accessible — trivially usable by all Ubiquitous — it goes where you need it Performant — fast end to end Secure — all resources are protected Reliable — you can count on it Programmable — you can build on it Manageable — it supports sys admins, too Sustainable — it will be there tomorrow 4
  • 5.
    globus.org Accessible means triviallyusable by all 5 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Researcher initiates transfer request; or requested automatically by script, science gateway 1 Instrument Compute Facility Globus transfers files reliably, securely 2 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Curator reviews and approves; dataset published on campus or other system 7 Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus 5 Researcher assembles dataset; describes it with Dublin core & domain-specific metadaa 6 6 Peers, collaborators search and discover datasets; transfer and share using Globus 8 Publication Repository Personal Computer Transfer Share Publish Discover • Access via web browser, command line, or REST API • Use any storage • Use existing identity
  • 6.
    globus.org Ubiquitous means itgoes where you need it 6 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable 10,000+ active endpoints Native packages Installs in seconds Linux, Windows, MacOS GPFS, Lustre, OrangeFS, … AWS S3, Ceph RadosGW Spectra Logic BlackPearl Google Drive, HPSS Amazon Glacier
  • 7.
    globus.org Performant means fastend to end 7 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable  Specialized protocols  Auto-configuration  Parallel DTNs  File system optimizations  Tape system optimizations 1PB in 1.002 days, ArgonneNCSA R. Kettimuthu et al.
  • 8.
    globus.org Secure means allresources are protected 8 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Globus service is itself highly secure  Best-practice cloud security  Third-party security reviews Globus platform ensures your services are secure  Accept credentials from 300+ identity providers  Control proxy credential lifetimes  Industry-standard OAuth-2 and OIDC protocols  Data encryption  Build secure services with controlled delegation
  • 9.
    globus.org Reliable means youcan count on it 9 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Each transfer is monitored, retried upon failure Protocols support restart Fail over on multiple DTNs Service is cloud hosted, with replication, dynamic failover, monitoring 99.5% uptime over past three years
  • 10.
    globus.org Programmable means youcan build on it Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Globus Auth API … GlobusTransferAPI GlobusConnect Data Publication & Discovery File Sharing File Transfer & Replication Use institutional ID systems in external web applications Integrate file transfer and sharing capabilities into scientific web apps, portals, gateways, etc. GET /endpoint/go%23ep1 PUT /endpoint/vas#my_endpt 200 OK X-Transfer-API-Version: 0.10 Content-Type: application/json … Web Command line REST API
  • 11.
    globus.org Programmable means youcan build on it Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Globus Auth API … GlobusTransferAPI GlobusConnect Data Publication & Discovery File Sharing File Transfer & Replication Use institutional ID systems in external web applications Integrate file transfer and sharing capabilities into scientific web apps, portals, gateways, etc. Python SDK Jupyter Notebooks
  • 12.
    Programmable means automation Recurringtransfers with sync option Copy /ingest Daily @ 3:30am Data distribution .../my_share --/cohort045 --/cohort096 --/cohort127 Shared Endpoint Staging area cleanup Shared Endpoint 1. Check if successful transfer 2. Delete data from staging area Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable globus.org
  • 13.
  • 14.
    globus.org Manageable means ithelps sys admins, too 14 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Low admin costs Priority support Usage reporting Management console Alternative identity provider Training materials Constant innovation
  • 15.
    globus.org Sustainable means itwill be there tomorrow 15 Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable Operated by professionals at the University of Chicago Supported by subscriptions from >65 institutions
  • 16.
    globus.org Raising the baron research software quality 5 major services 13 national labs use Globus 290PB transferred 10,000 active endpoints 50 Bn files processed 70,000 registered users 99.5% uptime 65+ institutional subscribers 1 PB largest single transfer to date 3 months longest continuously managed transfer 300+ federated campus identities 12,000 active users/year Accessible Ubiquitous Performant Secure Reliable Programmable Manageable Sustainable
  • 17.
  • 18.
    Thank you toour sponsors! U . S . D E P A R T M E N T O F ENERGY 18 Our subscribers globus.org

Editor's Notes

  • #6  A U P P S R M S PURPOSE SOFTWARE
  • #7  A U P P S R M S PURPOSE SOFTWARE
  • #9  A U P P S R M S PURPOSE SOFTWARE
  • #10  A U P P S R M S PURPOSE SOFTWARE
  • #11  A U P P S R M S PURPOSE SOFTWARE
  • #15  A U P P S R M S PURPOSE SOFTWARE
  • #16 Picture of team