Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)

44 views

Published on

Presented at the GlobusWorld Tour workshop at Columbia University, on April 24, 2019.

Published in: Data & Analytics
  • ⇒ www.WritePaper.info ⇐ is a good website if you’re looking to get your essay written for you. You can also request things like research papers or dissertations. It’s really convenient and helpful.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Leveraging the Globus Platform (GlobusWorld Tour - Columbia University)

  1. 1. Leveraging the Globus Platform Vas Vasiliadis vas@uchicago.edu Columbia University – April 24, 2019
  2. 2. Globus serves as… A platform for building science gateways, web portals and other applications in support of research and education 2
  3. 3. Example web apps that leverage Globus 3
  4. 4. Globus Auth (identity and access management) … GlobusAPIs (Transfer,Search,Identifiers,…) GlobusConnect Data Automation File Sharing File Transfer, Sync Globus Platform-as-a-Service
  5. 5. Globus Auth addresses security challenges • Make it easy for developers to provide login for their apps (web, mobile, desktop, command line) • …and protect all REST API communications o App à Globus service (MRDP, Jupyter Notebook) o App à non-Globus service (graph service in MRDP) o Service à Service • …while – Not introducing yet another identity – Providing a platform to consolidate existing identities – Providing a least privileges security model (via consents) – Being web friendly and language/framework agnostic 5
  6. 6. Based on widely used web standards • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) • Access via OAuth2 and OIDC libraries of your choice – Google OAuth Client Libraries, Apache mod_auth_openidc, etc. – Globus Python SDK 6 docs.globus.org/api/auth
  7. 7. Authorization Code Grant 7 Client (Web Portal, Application, Jupyter) Globus Transfer (Resource Server) Globus Auth (Authorization Server) 5. Authenticate using client id and secret, send authorization code Browser (User) 1. Access portal 2. Redirects user 3. User authenticates and consents 4. Authorization code 6. Access token(s) 7. Authenticate with access token(s) to give the client the authority invoke the transfer service Identity Provider
  8. 8. Globus Platform Transfer API
  9. 9. Globus Transfer API • Globus Web App consumes public Transfer API • REST API: Resources/actions named by URL – Query params allow refinement (e.g., filter) • Globus APIs use JSON for documents and resource representations • Requests authorized via Globus Auth issued OAuth2 access token – Authorization: Bearer asdflkqhafsdafeawk docs.globus.org/api/transfer 9
  10. 10. Globus Python SDK • Python client library for the Globus Auth and Transfer REST APIs • globus_sdk.TransferClient class handles connection management, security, framing, marshaling from globus_sdk import TransferClient tc = TransferClient() globus.github.io/globus-sdk-python 10
  11. 11. TransferClient low-level calls • Thin wrapper around REST API – post(), get(), update(), delete() get(path, params=None, headers=None, auth=None, response_class=None) o path – path for the request, with or without leading slash o params – dict to be encoded as a query string o headers – dict of HTTP headers to add to the request o response_class – class response object, overrides the client’s default_response_class o Returns: GlobusHTTPResponse object 11
  12. 12. TransferClient higher-level calls • One method for each API resource and HTTP verb • Largely direct mapping to REST API endpoint_search(filter_fulltext=None, filter_scope=None, num_results=25, **params) 12
  13. 13. Globus Helper Pages • Globus pages designed for use by your web apps – Browse Endpoint – Activate Endpoint – Select Group – Manage Identities – Manage Consents – Logout docs.globus.org/api/helper-pages 13
  14. 14. Example Modern Research Data Portal https://docs.globus.org/modern-research-data-portal/
  15. 15. API walkthrough using Jupyter Notebook • Use the Globus JupyterHub: jupyter.demo.globus.org – Sign in with Globus and verify the consents – Click “Start My Server” (this will take about a minute) – Navigate to: globus-jupyter-notebooks à GlobusWorldTour – Open Platform_Introduction_Native_App_Auth.ipynb • If you misstep and want to start over… – Close your notebook and navigate back to the root folder – Open and run the NotebookPuller.ipynb notebook • Access Globus notebooks outside of our JupyterHub: github.com/globus/globus-jupyter-notebooks 15
  16. 16. Endpoint Search • Plain text search for endpoint – Searches owner, display name, keywords, description, organization, department – Full word and prefix match • Limit search to pre-defined scopes – all, my-endpoints, recently-used, in-use, shared- by-me, shared-with-me • Returns: List of endpoint documents 16
  17. 17. Endpoint Management • Get endpoint (by id) • Update endpoint • Create & delete (shared) endpoints • Manage endpoint servers 17
  18. 18. Endpoint Activation • Activating endpoint means binding a credential to an endpoint for login • Server endpoints that have MyProxy or MyProxy OAuth identity provider require login via web • Auto-activate – Globus Connect Personal and Shared endpoints use Globus- provided credential – Must auto-activate before any API calls to endpoints 18
  19. 19. File operations • List directory contents (ls) • Make directory (mkdir) • Rename • Note: – Path encoding & UTF gotchas – Don’t forget to auto-activate first 19
  20. 20. Task submission • Asynchronous operations – Transfer o Sync level option – Delete • Get submission_id, followed by submit – Once and only once submission 20
  21. 21. Task management • Get task by id • Get task_list • Update task by id (label, deadline) • Cancel task by id • Get event list for task • Get task pause info 21
  22. 22. Shared endpoints and access rules (ACL) • Shared Endpoint – create / delete / get info / get list • Administrator role required to delegate access managers • Access manager role required to manage permissions/ACL • Operations: – Get list of access rules – Get access rule by id – Create access rule – Update access rule – Delete access rule 22
  23. 23. Management API • Allow endpoint administrators to monitor and manage all tasks with endpoint – Task API is essentially the same as for users – Information limited to what they could see locally • Cancel tasks • Pause rules 23
  24. 24. Globus PaaS developer resources Python SDK Sample Application docs.globus.org/api github.com/globus Jupyter Notebook
  25. 25. Enabling large-scale data intensive science with Jupyter 25
  26. 26. @python_app Logan Ward Jupyter notebooks enable rapid iteration/results
  27. 27. But the data are big, distributed… …and the science is collaborative petrel.alcf.anl.gov materialsdatafacility.org 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3 Need multi-credential, multi-service authentication and data management
  28. 28. Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth Browser /hub/ /user/[name]/ • Multi-user hub • Manages multiple instances of Jupyter notebook server • Configurable HTTP proxy JupyterHub Goal: Liberate the notebook! • Tokens for remote services • APIs for remote actions, e.g. data management via Globus service petrel.alcf.anl.gov
  29. 29. Securing JupyterHub with Globus Auth plugin • Existing OAuth framework • Can restrict IdP • Custom scopes • Tokens passed into notebook environment github.com/jupyterhub/oauthenticator
  30. 30. github.com/jupyterhub/oauthenticator#globus-setup Securing JupyterHub with Globus Auth
  31. 31. REST APIs REST APIs REST APIs Bearer a45cd... Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth /hub/ /user/[name]/ login Browser {"tokens":... {"tokens":... Tokens in Jupyter notebooks The world is your oyster API… • Globus Transfer • Globus Search • Your app • Data portal • Analysis engine • …
  32. 32. Automated data analysis/results distribution Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  33. 33. Experiment with the demo notebook • Login into our JupyterHub*: jupyter.demo.globus.org • Launch (spawn) a notebook server; get tokens • Access Globus APIs; download some data • “Analyze” data (generate plot) • PUT results (graph) on an HTTPS endpoint *zero-to-jupyterhub.readthedocs.io
  34. 34. Pushing the automation envelope 34
  35. 35. Our (simplistic) data flow thus far… • Adequate for ad hoc sharing (implicit knowledge) • Broader access, reuse requires “formalization” • Leverage additional Globus platform services Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  36. 36. Globus Search • Scalable service à billions of entries • Schema agnostic: use standard (e.g. DataCite) or custom metadata • Fine grained access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: facilitates data discovery • Rich query language: ranges, expressions, regex, etc. 36 docs.globus.org/api/search
  37. 37. Persistent identifiers • Developing service for issuing persistent identifiers – DOI, ARK, Handle, Globus – e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468 • Within a namespace, e.g. your DataCite namespace – Control which identities/groups can create identifiers • Identifier attributes: – Link to data: one or more https URLs, to file, folder or manifest – Landing page: provided by service, or by user – Visibility: identities, groups that can see identifier – Checksum: of the file or manifest – Metadata: as required by identifier (e.g., DataCite), extensible – Replaces/replaced-by: for versioning 37
  38. 38. SearchIdentifierDescribeTransferAuth Extending the automation flow • How can we enable data discovery using Globus platform services? Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  39. 39. Other Globus integrations • Web app development frameworks (Flask, Django) • Genomics analysis (Galaxy) – galaxyproject.org/authnz/use/oidc/idps/globus • Content management systems (WordPress, Drupal) • Development tools (Confluence, Jira) • Scalable cyberinfrastructure (Kubernetes) globus-integration-examples.readthedocs.io
  40. 40. Example Data Discovery Portal https://petreldata.net
  41. 41. Support resources • Globus documentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  42. 42. Join the Globus community • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline

×