More Related Content

Similar to Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)(20)


Recently uploaded(20)

Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)

  1. Facilitating Collaboration with Globus Vas Vasiliadis STFC – January 10, 2019
  2. Data sharing best practices 2
  3. Ad hoc data sharing • Individual users share data with collaborators • Using a known email or identity for user/group • Make data “publicly” available 1 Compute Facility Globus controls access to shared files on existing storage; no need to move files to cloud storage! Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus Personal Computer Share 2
  4. Data from instrument facility • Provide near-real time access to data • Automated permissions based on site policy • Self managed by the PI • Federated login to access data Raw data store Personal Computer Remote visualization/analysis Local policy store --/cohort045 --/cohort096 --/cohort127
  5. Data from provider/archive • Portal/science gateway to distribute data • Interface to search and gather data of interest • Asynchronous transfer to user’s system or via HTTPS to staged data • Fine-grained authorization enforced Search and request data of interest Transfer data to destination
  6. Core center data processing • Allow user to securely upload data for analysis • Make analysis results available to user • Automate setup and tear down of folders and permissions --/123/input rw Analysis System --/123/output r
  7. Common solution components • Shared endpoint for “staging” data • Application that manages permissions • Data transfer, to/from shared endpoint
  8. Conceptual architecture: Sharing Managed Endpoint Subscriber Control Domain Globus Control Domain User managed ”overlay” permissions DATA Channel CONTROL Channel Administrator managed filesystem restrictions External User Control Domain Shared Endpoint
  9. Data sharing features • Shared endpoint creation requires authentication – Cannot be completely automated – Must be a managed endpoint • Roles for management of endpoint and tasks – Assign role to other users, groups and applications • Access manager role grants others the rights to manage permissions – Others = users, groups, and applications
  10. Data sharing permissions management • Permissions are set per folder, on a shared endpoint • Permissions management can be automated • For a user – Identity: user must log in with this – Email: user gets a code via email; link to their Globus account • For a group – Group UUID: search for group to get UUID – Access governed by membership in the group • For an application – Application identity: (apps are people too!)
  11. Application concepts • Custom app (automatically) manages permissions – Can use Globus CLI (as demonstrated) • Confidential app: use client id and secret – Ensure application is on a secure device – Set up policy for rotation of secret – Identity:
  12. Client credential grant 12 1. Authenticate with app client id and secret 2. Access Tokens Application, Science Gateway, Data Portal (Client) 3. Authenticate as app with access tokens to invoke service (on behalf of authorized user, within a given scope) Globus Transfer (Resource Server) Globus Auth (Authorization Server)
  13. Data transfer scenarios • Application moving data of its own accord – App has access to source data and can write to destination – Requires shared endpoints on both sides (why?) – Client credential grant • Application moving data as user – Only user has access to data on source/destination – Authorization code grant – Similar to the data portal example presented earlier
  14. Walkthrough: Data Distribution Application What: Make select data available to authorized user(s) How: See example code at: 1. Creates folder on shared endpoint 2. Moves data to folder 3. Sets permissions on folder for user/group On your EC2 instance in ~/automation-examples
  15. Application registration • Register the application at – Redirects: – Scopes: • Get client id and secret • Add client id to the app
  16. Shared endpoint configuration • Create at top level folder • Set access manager role for app to manage access permissions • Optionally… – Set endpoint administrator role (can change endpoint definition) – Set endpoint manager role (can monitor and manage tasks) – Set endpoint monitor role (can monitor tasks)
  17. Application access • Use client credential grant to authenticate as app – Client id and secret used for obtaining tokens – Identity username is • Create a folder for user (or project) • Set permissions on folder (user/group) • Create transfer task to move data to folder • (Optionally) notify user(s) that data is available
  18. Enabling large-scale data intensive science with Jupyter 18
  19. Andre Schleife, UIUC 16,000 CPU-hours per simulation Sample'Experimental' sca0ering' Material' composi4on' Simulated' structure' Simulated' sca0ering' Sr'40%' Evolu4onary'op4miza4on' 786,432 CPUs, 10 PFLOPS supercomputer Argonne Leadership Computing Facility Modeling stopping power with time-dependent density functional theory
  20. @python_app Logan Ward Jupyter notebooks enable rapid iteration/results
  21. But the data are big, distributed… …and the science is collaborative 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3 Need multi-credential, multi-service authentication and data management
  22. Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth Browser /hub/ /user/[name]/ • Multi-user hub • Manages multiple instances of Jupyter notebook server • Configurable HTTP proxy JupyterHub Goal: Liberate the notebook! • Tokens for remote services • APIs for remote actions, e.g. data management via Globus service
  23. Securing JupyterHub with Globus Auth plugin • Existing OAuth framework • Can restrict IdP • Custom scopes • Tokens passed into notebook environment
  24. Securing JupyterHub with Globus Auth
  25. REST APIs REST APIs REST APIs Bearer a45cd... Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth /hub/ /user/[name]/ login Browser {"tokens":... {"tokens":... Tokens in Jupyter notebooks The world is your oyster API… • Globus Transfer • Globus Search • Your app • Data portal • Analysis engine • …
  26. Automated data analysis/results distribution Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  27. Experiment with the demo notebook • Login into our JupyterHub*: • Launch (spawn) a notebook server; get tokens • Access Globus APIs; download some data • “Analyze” data (generate plot) • PUT results (graph) on an HTTPS endpoint *
  28. Pushing the automation envelope 28
  29. Our (simplistic) data flow thus far… • Adequate for ad hoc sharing (implicit knowledge) • Broader access, reuse requires “formalization” • Leverage Globus data publication services Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  30. Globus Data Publication V1 SaaS publication BYO Storage & in-place publication User-managed collections Arbitrary metadata (with pre-defined schema) Handle, DOI PIDs Adoption since 2015: >1800 users, >600 datasets
  31. Publication V2: A platform for automation • Decompose data publication v1 into platform services • Facilitate flexible re-composition, adaptation by customers • Enable extension and enhancement • Initial services – Search, identifiers (and data management) • Future services – Description (metadata), flows
  32. Globus Search • Scalable service à billions of entries • Schema agnostic: use standard (e.g. DataCite) or custom metadata • Fine grained access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: facilitates data discovery • Rich query language: ranges, expressions, regex, etc. 32
  33. Globus Identifiers • Service for issuing persistent identifiers – DOI, ARK, Handle, Globus – e.g. • Within a namespace, e.g. your DataCite namespace – Control which identities/groups can create identifiers • Each identifier has… – Link to data: one or more https URLs, to file, folder or manifest – Landing page: provided by service, or by user – Visibility: identities, groups that can see identifier – Checksum: of the file or manifest – Metadata: as required by identifier (e.g., DataCite), extensible – Replaces/replaced-by: for versioning 33
  34. SearchIdentifierDescribeTransferAuth Extending the automation flow • How can we automate a data publication flow using Globus platform services? Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  35. Support resources • Globus documentation: • Sample code: • Helpdesk and issue escalation: • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  36. Join the Globus community • Access the service: • Create a personal endpoint: • Documentation: • Engage: • Subscribe: • Need help? • Follow us: @globusonline