Globus Integrations
Vas Vasiliadis
vas@uchicago.edu
May 2, 2019
Enabling large-scale
data intensive science
with Jupyter
2
Andre Schleife, UIUC
16,000 CPU-hours per simulation
Sample'Experimental'
sca0ering'
Material'
composi4on'
Simulated'
structure'
Simulated'
sca0ering'
La'60%'
Sr'40%'
Evolu4onary'op4miza4on'
786,432 CPUs, 10 PFLOPS supercomputer
Argonne Leadership Computing Facility
MDF: Advanced materials research
Modeling stopping power
with time-dependent density
functional theory
@python_app
Logan Ward
Jupyter notebooks enable rapid iteration/results
But the data are big, distributed…
…and the science is collaborative
petrel.alcf.anl.gov
materialsdatafacility.org
2PB, 80Gbps store
3.2M materials data
Cooley: 290 TFLOPS
Query1 Share4
Transfer2
Learn3
Need multi-credential, multi-service authentication and data management
Hub
Configurable HTTP proxy
Authenticator
User DB
Spawner
Notebook
/api/auth
Browser
/hub/
/user/[name]/
• Multi-user hub
• Manages multiple instances
of Jupyter notebook server
• Configurable HTTP proxy
JupyterHub
Goal: Liberate the notebook!
• Tokens for remote services
• APIs for remote actions, e.g. data
management via Globus service
petrel.alcf.anl.gov
Securing JupyterHub with Globus Auth plugin
• Existing OAuth
framework
• Can restrict IdP
• Custom scopes
• Tokens passed into
notebook environment
github.com/jupyterhub/oauthenticator
github.com/jupyterhub/oauthenticator#globus-setup
Securing JupyterHub with Globus Auth
REST APIs
REST APIs
REST APIs
Bearer a45cd...
Hub
Configurable HTTP proxy
Authenticator
User DB
Spawner
Notebook
/api/auth
/hub/
/user/[name]/
login
Browser
{"tokens":...
{"tokens":...
Tokens in Jupyter notebooks
The world is your
oyster API…
• Globus Transfer
• Globus Search
• Your app
• Data portal
• Analysis engine
• …
Ad hoc data analysis/results distribution
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
Experiment with the demo notebook
• Login into our JupyterHub*: jupyter.demo.globus.org
• Launch (spawn) a notebook server; get tokens
• Using the JupyterHub_Integration.ipynb notebook:
– Access Globus APIs; download some data
– “Analyze” data (generate plot)
– PUT results (graph) on an HTTPS endpoint
– Share the URL with others so they can access the results
*zero-to-jupyterhub.readthedocs.io
Leveraging the next
generation of services
12
Our (simplistic) data flow thus far…
• Adequate for ad hoc sharing (implicit knowledge)
• Broader access, reuse requires “formalization”
• Leverage additional Globus platform services
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
Globus Search
• Scalable service à billions of entries
• Schema agnostic: use standard (e.g. DataCite) or custom
metadata
• Fine grained access control: only returns results that are
visible to user
• Plain text search: ranked results
• Faceted search: facilitates data discovery
• Rich query language: ranges, expressions, regex, etc.
14
docs.globus.org/api/search
Persistent identifiers
• Developing service for issuing persistent identifiers
– DOI, ARK, Handle, Globus
– e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468
• Within a namespace, e.g. your DataCite namespace
– Control which identities/groups can create identifiers
• Identifier attributes:
– Link to data: one or more https URLs, to file, folder or manifest
– Landing page: provided by service, or by user
– Visibility: identities, groups that can see identifier
– Checksum: of the file or manifest
– Metadata: as required by identifier (e.g., DataCite), extensible
– Replaces/replaced-by: for versioning
15
SearchIdentifierDescribeTransferAuth
Extending the automation flow
• How can we enable more structured/robust data
discovery using Globus platform services?
Create
folder
Transfer
data
Get
metadata
Mint
persistent
identifier
Catalog
Get
credentials
Set ACL
Other Globus integrations
• Web app development frameworks (Flask, Django)
• Content management systems (WordPress, Drupal)
• Development tools (Confluence, Jira)
• Scalable cyberinfrastructure (Kubernetes)
• Genomics analysis (Galaxy)
– galaxyproject.org/authnz/use/oidc/idps/globus
globus-integration-examples.readthedocs.io
Example
ALCF Data Discovery Portal
https://petreldata.net
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline

Globus Integrations (JupyterHub, Django, ...)

  • 1.
  • 2.
  • 3.
    Andre Schleife, UIUC 16,000CPU-hours per simulation Sample'Experimental' sca0ering' Material' composi4on' Simulated' structure' Simulated' sca0ering' La'60%' Sr'40%' Evolu4onary'op4miza4on' 786,432 CPUs, 10 PFLOPS supercomputer Argonne Leadership Computing Facility MDF: Advanced materials research Modeling stopping power with time-dependent density functional theory
  • 4.
    @python_app Logan Ward Jupyter notebooksenable rapid iteration/results
  • 5.
    But the dataare big, distributed… …and the science is collaborative petrel.alcf.anl.gov materialsdatafacility.org 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3 Need multi-credential, multi-service authentication and data management
  • 6.
    Hub Configurable HTTP proxy Authenticator UserDB Spawner Notebook /api/auth Browser /hub/ /user/[name]/ • Multi-user hub • Manages multiple instances of Jupyter notebook server • Configurable HTTP proxy JupyterHub Goal: Liberate the notebook! • Tokens for remote services • APIs for remote actions, e.g. data management via Globus service petrel.alcf.anl.gov
  • 7.
    Securing JupyterHub withGlobus Auth plugin • Existing OAuth framework • Can restrict IdP • Custom scopes • Tokens passed into notebook environment github.com/jupyterhub/oauthenticator
  • 8.
  • 9.
    REST APIs REST APIs RESTAPIs Bearer a45cd... Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth /hub/ /user/[name]/ login Browser {"tokens":... {"tokens":... Tokens in Jupyter notebooks The world is your oyster API… • Globus Transfer • Globus Search • Your app • Data portal • Analysis engine • …
  • 10.
    Ad hoc dataanalysis/results distribution Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  • 11.
    Experiment with thedemo notebook • Login into our JupyterHub*: jupyter.demo.globus.org • Launch (spawn) a notebook server; get tokens • Using the JupyterHub_Integration.ipynb notebook: – Access Globus APIs; download some data – “Analyze” data (generate plot) – PUT results (graph) on an HTTPS endpoint – Share the URL with others so they can access the results *zero-to-jupyterhub.readthedocs.io
  • 12.
  • 13.
    Our (simplistic) dataflow thus far… • Adequate for ad hoc sharing (implicit knowledge) • Broader access, reuse requires “formalization” • Leverage additional Globus platform services Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  • 14.
    Globus Search • Scalableservice à billions of entries • Schema agnostic: use standard (e.g. DataCite) or custom metadata • Fine grained access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: facilitates data discovery • Rich query language: ranges, expressions, regex, etc. 14 docs.globus.org/api/search
  • 15.
    Persistent identifiers • Developingservice for issuing persistent identifiers – DOI, ARK, Handle, Globus – e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468 • Within a namespace, e.g. your DataCite namespace – Control which identities/groups can create identifiers • Identifier attributes: – Link to data: one or more https URLs, to file, folder or manifest – Landing page: provided by service, or by user – Visibility: identities, groups that can see identifier – Checksum: of the file or manifest – Metadata: as required by identifier (e.g., DataCite), extensible – Replaces/replaced-by: for versioning 15
  • 16.
    SearchIdentifierDescribeTransferAuth Extending the automationflow • How can we enable more structured/robust data discovery using Globus platform services? Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  • 17.
    Other Globus integrations •Web app development frameworks (Flask, Django) • Content management systems (WordPress, Drupal) • Development tools (Confluence, Jira) • Scalable cyberinfrastructure (Kubernetes) • Genomics analysis (Galaxy) – galaxyproject.org/authnz/use/oidc/idps/globus globus-integration-examples.readthedocs.io
  • 18.
    Example ALCF Data DiscoveryPortal https://petreldata.net
  • 19.
    Support resources • Globusdocumentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 20.
    Join the Globuscommunity • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline