Integrating Globus into LRZ's Data Science Storage Service
2. 2
Integrating Globus into LRZ’s
Data Science Storage Service
GlobusWorld 2019 | 2019-05-01 | Stephan Peinkofer
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
3. 3Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Bavarian Academy of Sciences and Humanities
Leibniz Supercomputing Centre
Computer Centre
for all Munich Universities250
employees
approx.
57
years of
IT support
IT Service Backbone for the Advancement of Science and Research
Regional Computer Centre
for all Bavarian Universities
National Supercomputing Centre
(GCS)
European Supercomputing Centre
(PRACE)
4. High Performance Computing
SuperMUC-NG, LRZ Linux Cluster
Virtual Reality and Visualisation
V2C (CAVE, Powerwall)
4
Operating Cutting-Edge IT Infrastructure
LRZ as an IT Center of Excellence
Storage
Network
Cloud Computing
Cluster
HPC
Training
Consultancy
Email
High Speed Networking
Munich Scientific Network
Big Data
Bavarian State Library Digital Archive
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
5. 5Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Data Silos
6. Increasing User Demand
6
I need to share a 400TB dataset
with someone in Canada!
My experiment will generate
multiple PBs, that have to be
analyzed and backed up! How?
I want to build a WebApp that allows
users to interactively analyze my
500TB SuperMUC simulation data!
I need to share
some data
on SuperMUC
between multiple
projects!
I want to analyze a large
dataset, generated on Super-
MUC, using some special OS
image on the LRZ Cloud!
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
7. 7Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Satisfying User Demands
So basically we need to provide …
A file system that can be
shared amongst the complete
LRZ HPC Ecosystem
Some kind of external
access mechanism
for arbitrary entities
A Dropbox like
data management
approach
8. LRZ Data Science Storage
8Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
Interactive processing
on LRZ Compute Cloud
Remote visualisation
on LRZs visualisation
systems
External access and sharing
via Globus Online
High performance backup
and archive of data on LRZs
Backup- and Archive System
Batch and interactive processing
on dedicated, hosted HPC Cluster at LRZ
High throughput batch processing
on LRZs Linux Cluster or SuperMUC
LRZ
Data
Science
Storage
11. Huber
LMU User: lmuuser2
LinuxCluster SuperMUC
Project: lxpr2 Project: smpr2
User: lx22bp User: sm33sx
DSS Containers
11
Maier
TUM User: tumuser1
LinuxCluster SuperMUC
Project: lxpr1 Project: smpr1
User: lx11xc User: sm11bb
DSS POSIX Group in IDM/LDAP
pr45xa-dss-0000
DSS Container à GPFS Independent Fileset
/dss/dssfs01/pr45xa-dss-0000
drwxrws--- root pr45xa-dss-0000
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
12. Technical Integration of Globus to LRZ DSS
Goal
12
Integrate Globus Sharing to
DSSWeb Self-Service Portal.
Allow Data Curators to share
DSS Containers with
arbitrary external users.
Problem Action
Globus let’s us control.
Who can share?
What can be shared?
We need to control.
Who can share what?
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
13. LRZ Data Science Storage
Technical Integration of Globus to LRZ DSS
13
DSS Container X
Container Group
/dss/dssfs01/dsscontX
DSS Container Directory
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
DSSWeb
Globus Online
LRZ MyProxy
DSS Globus Endpoint
1. Enable Globus Sharing
for DSS Container X
Data Curator
RobotUser aka
RobotUser@globusid.org
2. Login
to
MyProxy to
get
Certificate
3. Enable DSS Globus Endpoint
4. Create Shared Endpoint “LRZ DSS Container X”
LRZ DSS Container X
Shared Endpoint
6. Add RobotUser to
Container Access Group
5. Globus Magic
14. Technical Integration of Globus to LRZ DSS
14Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer
DSSWeb
1. Invite
bop@wherever.com to
access DSS Container X
via Globus
Data Curator
RobotUser aka
RobotUser@globusid.org
2. Check if identity bop@wherever.com is already
known by Globus and if not create it
3. Add Globus ACL for Shared Endpoint LRZ DSS
Container X for identity bop@wherever.com
4. Globus Magic
bop@wherever.com
5. Bop is happy
LRZ Data Science Storage
DSS Container X
Container Group
/dss/dssfs01/dsscontX
DSS Container Directory
DSS Globus Endpoint*
LRZ DSS Container X
Shared Endpoint
Globus Online
15. Legal Integration of Globus to LRZ DSS
Regulation
15
European Union enforced the
EU General Data Protection
Regulation (GDPR) on 2018-05-
25
Use/Integration of Cloud
Services that process PII
requires a formal Controller-
Processor Agreement.
Transfer of personal data to third
countries requires special
safeguards
HIPPA and NIST rescue BAA to the rescue
HIPPA and NIST require
roughly similar technical and
organizational security controls
that are required by GDPR to
protect PII
Globus agreed to sign a
Controller-Processor
Agreement that contains the
EU-Model Clauses
Integrating Globus into LRZ’s Data Science Storage Service | 2019-05-01 | Stephan Peinkofer