Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Globus Integrations (GlobusWorld Tour - UMich)

Presented at the GlobusWorld Tour workshop at the University of Michigan, on July 22, 2019.

  • Login to see the comments

  • Be the first to like this

Globus Integrations (GlobusWorld Tour - UMich)

  1. 1. Globus Integrations Nickolaus Saint University of Michigan July 23, 2019
  2. 2. JupyterHub 2
  3. 3. Enabling large-scale data intensive science with Jupyter 3
  4. 4. Andre Schleife, UIUC 16,000 CPU-hours per simulation Sample'Experimental' sca0ering' Material' composi4on' Simulated' structure' Simulated' sca0ering' La'60%' Sr'40%' Evolu4onary'op4miza4on' 786,432 CPUs, 10 PFLOPS supercomputer Argonne Leadership Computing Facility MDF: Advanced materials research Modeling stopping power with time-dependent density functional theory
  5. 5. @python_app Logan Ward Jupyter notebooks enable rapid iteration/results
  6. 6. But the data are big, distributed… …and the science is collaborative 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3 Need multi-credential, multi-service authentication and data management
  7. 7. Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth Browser /hub/ /user/[name]/ • Multi-user hub • Manages multiple instances of Jupyter notebook server • Configurable HTTP proxy JupyterHub Goal: Liberate the notebook! • Tokens for remote services • APIs for remote actions, e.g. data management via Globus service
  8. 8. Securing JupyterHub with Globus Auth plugin • Existing OAuth framework • Can restrict IdP • Custom scopes • Tokens passed into notebook environment
  9. 9. Securing JupyterHub with Globus Auth
  10. 10. REST APIs REST APIs REST APIs Bearer a45cd... Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth /hub/ /user/[name]/ login Browser {"tokens":... {"tokens":... Tokens in Jupyter notebooks The world is your oyster API… • Globus Transfer • Globus Search • Your app • Data portal • Analysis engine • …
  11. 11. Ad hoc data analysis/results distribution Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  12. 12. Experiment with the demo notebook • Login into our JupyterHub*: • Launch (spawn) a notebook server; get tokens • Using the JupyterHub_Integration.ipynb notebook: – Access Globus APIs; download some data – “Analyze” data (generate plot) – PUT results (graph) on an HTTPS endpoint – Share the URL with others so they can access the results *
  13. 13. Leveraging the next generation of services 13
  14. 14. Our (simplistic) data flow thus far… • Adequate for ad hoc sharing (implicit knowledge) • Broader access, reuse requires “formalization” • Leverage additional Globus platform services Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  15. 15. Globus Search • Scalable service à billions of entries • Schema agnostic: use standard (e.g. DataCite) or custom metadata • Fine grained access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: facilitates data discovery • Rich query language: ranges, expressions, regex, etc. 15
  16. 16. Persistent identifiers • Developing service for issuing persistent identifiers – DOI, ARK, Handle, Globus – e.g. • Within a namespace, e.g. your DataCite namespace – Control which identities/groups can create identifiers • Identifier attributes: – Link to data: one or more https URLs, to file, folder or manifest – Landing page: provided by service, or by user – Visibility: identities, groups that can see identifier – Checksum: of the file or manifest – Metadata: as required by identifier (e.g., DataCite), extensible – Replaces/replaced-by: for versioning 16
  17. 17. SearchIdentifierDescribeTransferAuth Extending the automation flow • How can we enable more structured/robust data discovery using Globus platform services? Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  18. 18. Other Globus integrations • Web app development frameworks (Flask, Django) • Content management systems (WordPress, Drupal) • Development tools (Confluence, Jira) • Scalable cyberinfrastructure (Kubernetes) • Genomics analysis (Galaxy) –
  19. 19. Example ALCF Data Discovery Portal
  20. 20. Support resources • Globus documentation: • Sample code: • Helpdesk and issue escalation: • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  21. 21. Join the Globus community • Access the service: • Create a personal endpoint: • Documentation: • Engage: • Subscribe: • Need help? • Follow us: @globusonline