SlideShare a Scribd company logo
Ian Foster
foster@uchicago.edu
JupyterCon, New York, August 23, 2018
Scaling collaborative
data science
with Globus and Jupyter
Andre Schleife
UIUC
Modeling stopping power
with time-dependent
density functional theory
Hydrogen in Gold, v=2.0
16,000 CPU-hours per simulation
SampleExperimental
sca ering
Material
composi on
Simulated
structure
Simulated
sca ering
La 60%
Sr 40%
Evolu onary op miza on
786,432 CPUs, 10 PFLOPS (1016 flops) supercomputer
Argonne Leadership Computing Facility
@python_app
Logan Ward
But data are big and distributed, and
our science is collaborative
(1) Query
(2) Transfer
(3) Learn
materialsdatafacility.org
petrel.alcf.anl.gov
Cooley: 290 TeraFLOPS
(4) Share
2 PB, 80 Gbps Globus-enabled store
3.2M materials data
We need multi-credential,
multi-service authentication
and big data management
operated by UChicago for researchers worldwide
Auto-
mate
globus.org
Globus services
• Multi-user Hub
• Configurable HTTP proxy
• Multiple single-user Jupyter
notebook servers
Recall: JupyterHub components
Hub
Configurable HTTP proxy
Authenticator
User database
Spawner
Notebook
/api/auth
Browser
/hub/
/user/[name]/
• Multi-user Hub
• Configurable HTTP proxy
• Multiple single-user Jupyter
notebook servers
Recall: JupyterHub components
We want to grant notebooks
access to the world
• Tokens for remote services
• APIs for remote actions: e.g.,
Globus data management Hub
Configurable HTTP proxy
Authenticator
User database
Spawner
Notebook
/api/auth
Browser
/hub/
/user/[name]/
Cooley Globus
Petrel
Securing JupyterHub with Globus Auth
We provide a simple
Globus OAuth plugin
• 100s of identity providers
(can restrict which ones)
• 1000 registered clients, apps
• Custom scopes
• Tokens passed into notebook
environment
JupyterHub OAuthenticator
Use within JupyterHub is easy
https://github.com/jupyterhub/oauthenticator#globus-setup
Tokens are easily used within notebooks
Login
REST APIs
{“tokens”:…
{“tokens”:…
REST APIs
REST APIs
Bearer a45cd…
Globus Transfer
Globus Search
Globus Publish
Your App
Another App
Hub
Configurable HTTP proxy
Authenticator
User database Notebook
/hub/
/user/[name]/
Spawner
/api/auth
Browser
In particular, you can access Globus services
Globus Transfer
• Uniform access to
distributed storage (Posix,
S3, Ceph, HPSS, Google
Drive, Hadoop, Lustre,, …)
• HTTPS; GridFTP for high-
speed, reliable, third-party
transfers
• Shared endpoints: User-
managed access control
• Web, REST, CLI access
• HIPAA compliant 12,000 active Globus Connect endpoints
(including most universities and labs)
In particular, you can access Globus services
Globus Search
• Cloud-hosted, schema agnostic
• Scale to billions of objects
Globus Identifiers
• Digital object ids for your data
• DataCite or other metadata
Globus Publication platform
• Customized publication pipelines Canadian Federated Research
Data Repository: https://frdr.ca/
Demonstration
What we’re going to do:
• Login into our JupyterCon JupyterHub*
• Launch (spawn) a Notebook Server
• Get tokens
• Access some Globus APIs
• Download some data
• Plot it
• PUT the result on an HTTPS endpoint
*Zero to JupyterHub: Fast JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
https://jupyter.demo.globus.org/
Login to Start Tutorial
The story so far …
• Globus APIs enable authentication, data access, data
movement, data sharing, data search
• Can be used in notebooks and in JupyterHub/Lab to
access any data anywhere and to secure any resource
But wait, there’s more!
• Globus APIs enable authentication, data access, data
movement, data sharing, data search
• Can be used in notebooks and in JupyterHub/Lab to
access any data anywhere and to secure any resource
• Create a containerized data science ecosystem that
encompasses laptops, servers, clouds, HPC
Container
Registry
I AM
cont ai ner met adat a
cont ai ner r eci pes
ALCF Petrel
cont ai ner s
Supercomputer
compute
compute
compute
compute
JupyterHub
Notebook Server
Containers are staged
to local file systems
Users select the container
to execute their custom
Jupyter environment
The same containers can be used for
both Jupyter notebook server and
compute nodes, for consistency
Unified IAM platform
scalable for
distributed projects
Container definitions
are tracked in version
control systems
A registry for container
discovery and referencing
Containers can be used for
other tasks: analysis; ML; etc.
Containers
everywhere
And more …
• Globus APIs enable authentication, data access, data
movement, data sharing, data search
• Can be used in notebooks and in JupyterHub/Lab to
access any data anywhere and to secure any resource
• Create a containerized data science ecosystem that
encompasses laptops, servers, clouds, HPC
• Incorporate seamless parallel computing via Parsl
Python parallel library
• Tasks exposed as
functions (Python or bash)
• Python code to glue functions together
• Globus for auth and data movement
(Data) science applications require:
• Interactivity
• Scalability
- Need more than a desktop
• Reproducibility
- Publish code and documentation
Our solution: JupyterHub + Parsl
 Interactive computing environment
 Notebooks for publication
 Can run on dedicated hardware
parsl-project.org
Interactive, scalable, reproducible data analysis
@python_app
def compute_features(chunk):
for f in featurizers:
chunk = f.featurize_dataframe(chunk, 'atoms')
return chunk
chunks = [compute_features(chunk)
for chunk in np.array_split(data, chunks)]
@python_app
And more …
• Globus APIs enable authentication, data access, data
movement, data sharing, data search
• Can be used in notebooks and in JupyterHub/Lab to
access any data anywhere and to secure any resource
• Create a containerized data science ecosystem that
encompasses laptops, servers, clouds, HPC
• Incorporate seamless parallel computing via Parsl
• Jupyter notebooks for rules-based automation
– Notebooks are triggered by events (e.g., new data available)
– Notebooks trigger events (e.g., computation completed)
globus.org
And more …
• Globus APIs enable authentication, data access, data
movement, data sharing, data search
• Can be used in notebooks and in JupyterHub/Lab to
access any data anywhere and to secure any resource
• Create a containerized data science ecosystem that
encompasses laptops, servers, clouds, HPC
• Incorporate seamless parallel computing via Parsl
• Jupyter notebooks for rules-based automation
• Integration with JupyterLab (student summer project)
Juan David Garrido
Browse data on
local storage
Search remote
storage systems
Select files on
remote storage
Transfer data to
local storage
There it is!
Search remote
databases
Select materials
data
Inspect
materials data
Ben Blaiszik Steve TueckeKyle Chard Jim Pruyne Logan WardRachana
Ananthakrishnan
Ryan Chard Mike Papka Rick Wagner
I reported on the work of many talented people
And others from the Globus team, the University of Chicago, and Argonne Nat Lab
We are grateful to our sponsors
DLHub Globus
IMaD
Petrel
Argonne Leadership
Computing Facility
At JupyterCon 2018
For more information
“Globus APIs enable authentication, data access, data
movement, data sharing, data search”
 See https://docs.globus.org
“Can be used in notebooks and in JupyterHub/Lab to
access any data anywhere and to secure any resource”
 Tutorial: https://jupyter.demo.globus.org
Blog: https://www.globus.org/blog/using-globus-jupyter-notebooks
“But wait, there’s more!”
 Talk to me, or check back in a few months
foster@uchicago.edu

More Related Content

What's hot

Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
Ian Foster
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Microsoft Azure for Research
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
Microsoft Azure for Research
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
Microsoft Azure for Research
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
Lewis Crawford
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
Ian Foster
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
Microsoft Azure for Research
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
Ian Foster
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
Robert Grossman
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)
Evert Lammerts
 
"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013
Kaitlin Thaney
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
Robert Grossman
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
c.titus.brown
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
Ian Foster
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
Tanu Malik
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
Jose Enrique Ruiz
 
OSCON 2015
OSCON 2015OSCON 2015
OSCON 2015
Charles Smith
 

What's hot (20)

Accelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundaneAccelerating data-intensive science by outsourcing the mundane
Accelerating data-intensive science by outsourcing the mundane
 
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis GannonDoing Research in the Cloud - NIH Workshop Dennis Gannon
Doing Research in the Cloud - NIH Workshop Dennis Gannon
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Cloud com foster december 2010
Cloud com foster december 2010Cloud com foster december 2010
Cloud com foster december 2010
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Introduction NL-HUG (April)
Introduction NL-HUG (April)Introduction NL-HUG (April)
Introduction NL-HUG (April)
 
"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013"A Toolkit for Digital Research" - CNI 2013
"A Toolkit for Digital Research" - CNI 2013
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
2014 moore-ddd
2014 moore-ddd2014 moore-ddd
2014 moore-ddd
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
 
OSCON 2015
OSCON 2015OSCON 2015
OSCON 2015
 

Similar to Scaling collaborative data science with Globus and Jupyter

Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Globus
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
Globus
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
Globus
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster
 
Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)
Globus
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
Globus
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ian Foster
 
Scalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslScalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with Parsl
Globus
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Globus
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Globus
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Databricks
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)
Globus
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
Ravi Madduri
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
Ian Foster
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Bertram Ludäscher
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
Globus
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
Raminder Singh
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Cloudian
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
Mary Bass
 

Similar to Scaling collaborative data science with Globus and Jupyter (20)

Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
 
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UCSD)
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)Globus Integrations (CHPC 2019 - South Africa)
Globus Integrations (CHPC 2019 - South Africa)
 
Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)Globus Integrations (GlobusWorld Tour - UMich)
Globus Integrations (GlobusWorld Tour - UMich)
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Scalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with ParslScalable Parallel Programming in Python with Parsl
Scalable Parallel Programming in Python with Parsl
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (JupyterHub, Django, ...)
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
 
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
Case Study: Implementing Hadoop and Elastic Map Reduce on Scale-out Object S...
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 

More from Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
Ian Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
Ian Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
Ian Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
Ian Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Ian Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
Ian Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
Ian Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
Ian Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
Ian Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
Ian Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
Ian Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
Ian Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
Ian Foster
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Ian Foster
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
Ian Foster
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
Ian Foster
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 

More from Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Better Information Faster: Programming the Continuum
Better Information Faster: Programming the ContinuumBetter Information Faster: Programming the Continuum
Better Information Faster: Programming the Continuum
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
 
Streamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer researchStreamlined data sharing and analysis to accelerate cancer research
Streamlined data sharing and analysis to accelerate cancer research
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

Scaling collaborative data science with Globus and Jupyter

  • 1. Ian Foster foster@uchicago.edu JupyterCon, New York, August 23, 2018 Scaling collaborative data science with Globus and Jupyter
  • 2. Andre Schleife UIUC Modeling stopping power with time-dependent density functional theory Hydrogen in Gold, v=2.0 16,000 CPU-hours per simulation SampleExperimental sca ering Material composi on Simulated structure Simulated sca ering La 60% Sr 40% Evolu onary op miza on 786,432 CPUs, 10 PFLOPS (1016 flops) supercomputer Argonne Leadership Computing Facility
  • 4. But data are big and distributed, and our science is collaborative (1) Query (2) Transfer (3) Learn materialsdatafacility.org petrel.alcf.anl.gov Cooley: 290 TeraFLOPS (4) Share 2 PB, 80 Gbps Globus-enabled store 3.2M materials data We need multi-credential, multi-service authentication and big data management
  • 5. operated by UChicago for researchers worldwide Auto- mate globus.org Globus services
  • 6. • Multi-user Hub • Configurable HTTP proxy • Multiple single-user Jupyter notebook servers Recall: JupyterHub components Hub Configurable HTTP proxy Authenticator User database Spawner Notebook /api/auth Browser /hub/ /user/[name]/
  • 7. • Multi-user Hub • Configurable HTTP proxy • Multiple single-user Jupyter notebook servers Recall: JupyterHub components We want to grant notebooks access to the world • Tokens for remote services • APIs for remote actions: e.g., Globus data management Hub Configurable HTTP proxy Authenticator User database Spawner Notebook /api/auth Browser /hub/ /user/[name]/ Cooley Globus Petrel
  • 8. Securing JupyterHub with Globus Auth We provide a simple Globus OAuth plugin • 100s of identity providers (can restrict which ones) • 1000 registered clients, apps • Custom scopes • Tokens passed into notebook environment JupyterHub OAuthenticator
  • 9. Use within JupyterHub is easy https://github.com/jupyterhub/oauthenticator#globus-setup
  • 10. Tokens are easily used within notebooks Login REST APIs {“tokens”:… {“tokens”:… REST APIs REST APIs Bearer a45cd… Globus Transfer Globus Search Globus Publish Your App Another App Hub Configurable HTTP proxy Authenticator User database Notebook /hub/ /user/[name]/ Spawner /api/auth Browser
  • 11. In particular, you can access Globus services Globus Transfer • Uniform access to distributed storage (Posix, S3, Ceph, HPSS, Google Drive, Hadoop, Lustre,, …) • HTTPS; GridFTP for high- speed, reliable, third-party transfers • Shared endpoints: User- managed access control • Web, REST, CLI access • HIPAA compliant 12,000 active Globus Connect endpoints (including most universities and labs)
  • 12. In particular, you can access Globus services Globus Search • Cloud-hosted, schema agnostic • Scale to billions of objects Globus Identifiers • Digital object ids for your data • DataCite or other metadata Globus Publication platform • Customized publication pipelines Canadian Federated Research Data Repository: https://frdr.ca/
  • 13. Demonstration What we’re going to do: • Login into our JupyterCon JupyterHub* • Launch (spawn) a Notebook Server • Get tokens • Access some Globus APIs • Download some data • Plot it • PUT the result on an HTTPS endpoint *Zero to JupyterHub: Fast JupyterHub on Kubernetes https://zero-to-jupyterhub.readthedocs.io
  • 15. The story so far … • Globus APIs enable authentication, data access, data movement, data sharing, data search • Can be used in notebooks and in JupyterHub/Lab to access any data anywhere and to secure any resource
  • 16. But wait, there’s more! • Globus APIs enable authentication, data access, data movement, data sharing, data search • Can be used in notebooks and in JupyterHub/Lab to access any data anywhere and to secure any resource • Create a containerized data science ecosystem that encompasses laptops, servers, clouds, HPC
  • 17. Container Registry I AM cont ai ner met adat a cont ai ner r eci pes ALCF Petrel cont ai ner s Supercomputer compute compute compute compute JupyterHub Notebook Server Containers are staged to local file systems Users select the container to execute their custom Jupyter environment The same containers can be used for both Jupyter notebook server and compute nodes, for consistency Unified IAM platform scalable for distributed projects Container definitions are tracked in version control systems A registry for container discovery and referencing Containers can be used for other tasks: analysis; ML; etc. Containers everywhere
  • 18. And more … • Globus APIs enable authentication, data access, data movement, data sharing, data search • Can be used in notebooks and in JupyterHub/Lab to access any data anywhere and to secure any resource • Create a containerized data science ecosystem that encompasses laptops, servers, clouds, HPC • Incorporate seamless parallel computing via Parsl
  • 19. Python parallel library • Tasks exposed as functions (Python or bash) • Python code to glue functions together • Globus for auth and data movement (Data) science applications require: • Interactivity • Scalability - Need more than a desktop • Reproducibility - Publish code and documentation Our solution: JupyterHub + Parsl  Interactive computing environment  Notebooks for publication  Can run on dedicated hardware parsl-project.org Interactive, scalable, reproducible data analysis @python_app def compute_features(chunk): for f in featurizers: chunk = f.featurize_dataframe(chunk, 'atoms') return chunk chunks = [compute_features(chunk) for chunk in np.array_split(data, chunks)]
  • 21. And more … • Globus APIs enable authentication, data access, data movement, data sharing, data search • Can be used in notebooks and in JupyterHub/Lab to access any data anywhere and to secure any resource • Create a containerized data science ecosystem that encompasses laptops, servers, clouds, HPC • Incorporate seamless parallel computing via Parsl • Jupyter notebooks for rules-based automation – Notebooks are triggered by events (e.g., new data available) – Notebooks trigger events (e.g., computation completed)
  • 23. And more … • Globus APIs enable authentication, data access, data movement, data sharing, data search • Can be used in notebooks and in JupyterHub/Lab to access any data anywhere and to secure any resource • Create a containerized data science ecosystem that encompasses laptops, servers, clouds, HPC • Incorporate seamless parallel computing via Parsl • Jupyter notebooks for rules-based automation • Integration with JupyterLab (student summer project)
  • 30.
  • 34. Ben Blaiszik Steve TueckeKyle Chard Jim Pruyne Logan WardRachana Ananthakrishnan Ryan Chard Mike Papka Rick Wagner I reported on the work of many talented people And others from the Globus team, the University of Chicago, and Argonne Nat Lab We are grateful to our sponsors DLHub Globus IMaD Petrel Argonne Leadership Computing Facility At JupyterCon 2018
  • 35. For more information “Globus APIs enable authentication, data access, data movement, data sharing, data search”  See https://docs.globus.org “Can be used in notebooks and in JupyterHub/Lab to access any data anywhere and to secure any resource”  Tutorial: https://jupyter.demo.globus.org Blog: https://www.globus.org/blog/using-globus-jupyter-notebooks “But wait, there’s more!”  Talk to me, or check back in a few months foster@uchicago.edu