Globus Integrations (GlobusWorld Tour - UMich)

Jul. 30, 2019

More Related Content

Similar to Globus Integrations (GlobusWorld Tour - UMich)(20)


Globus Integrations (GlobusWorld Tour - UMich)

  1. Globus Integrations Nickolaus Saint University of Michigan July 23, 2019
  2. JupyterHub 2
  3. Enabling large-scale data intensive science with Jupyter 3
  4. Andre Schleife, UIUC 16,000 CPU-hours per simulation Sample'Experimental' sca0ering' Material' composi4on' Simulated' structure' Simulated' sca0ering' La'60%' Sr'40%' Evolu4onary'op4miza4on' 786,432 CPUs, 10 PFLOPS supercomputer Argonne Leadership Computing Facility MDF: Advanced materials research Modeling stopping power with time-dependent density functional theory
  5. @python_app Logan Ward Jupyter notebooks enable rapid iteration/results
  6. But the data are big, distributed… …and the science is collaborative 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3 Need multi-credential, multi-service authentication and data management
  7. Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth Browser /hub/ /user/[name]/ • Multi-user hub • Manages multiple instances of Jupyter notebook server • Configurable HTTP proxy JupyterHub Goal: Liberate the notebook! • Tokens for remote services • APIs for remote actions, e.g. data management via Globus service
  8. Securing JupyterHub with Globus Auth plugin • Existing OAuth framework • Can restrict IdP • Custom scopes • Tokens passed into notebook environment
  9. Securing JupyterHub with Globus Auth
  10. REST APIs REST APIs REST APIs Bearer a45cd... Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth /hub/ /user/[name]/ login Browser {"tokens":... {"tokens":... Tokens in Jupyter notebooks The world is your oyster API… • Globus Transfer • Globus Search • Your app • Data portal • Analysis engine • …
  11. Ad hoc data analysis/results distribution Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  12. Experiment with the demo notebook • Login into our JupyterHub*: • Launch (spawn) a notebook server; get tokens • Using the JupyterHub_Integration.ipynb notebook: – Access Globus APIs; download some data – “Analyze” data (generate plot) – PUT results (graph) on an HTTPS endpoint – Share the URL with others so they can access the results *
  13. Leveraging the next generation of services 13
  14. Our (simplistic) data flow thus far… • Adequate for ad hoc sharing (implicit knowledge) • Broader access, reuse requires “formalization” • Leverage additional Globus platform services Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  15. Globus Search • Scalable service à billions of entries • Schema agnostic: use standard (e.g. DataCite) or custom metadata • Fine grained access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: facilitates data discovery • Rich query language: ranges, expressions, regex, etc. 15
  16. Persistent identifiers • Developing service for issuing persistent identifiers – DOI, ARK, Handle, Globus – e.g. • Within a namespace, e.g. your DataCite namespace – Control which identities/groups can create identifiers • Identifier attributes: – Link to data: one or more https URLs, to file, folder or manifest – Landing page: provided by service, or by user – Visibility: identities, groups that can see identifier – Checksum: of the file or manifest – Metadata: as required by identifier (e.g., DataCite), extensible – Replaces/replaced-by: for versioning 16
  17. SearchIdentifierDescribeTransferAuth Extending the automation flow • How can we enable more structured/robust data discovery using Globus platform services? Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  18. Other Globus integrations • Web app development frameworks (Flask, Django) • Content management systems (WordPress, Drupal) • Development tools (Confluence, Jira) • Scalable cyberinfrastructure (Kubernetes) • Genomics analysis (Galaxy) –
  19. Example ALCF Data Discovery Portal
  20. Support resources • Globus documentation: • Sample code: • Helpdesk and issue escalation: • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  21. Join the Globus community • Access the service: • Create a personal endpoint: • Documentation: • Engage: • Subscribe: • Need help? • Follow us: @globusonline