Jupyter con meetup extended jupyter kernel gateway

IBM SparkTechnology Center
Graph Technologies Meetup
Building Enterprise/Cloud Analytics Platform
with Jupyter Notebooks and Apache Spark
Luciano Resende
IBM | Spark Technology Center

About Me
Luciano Resende (lresende@apache.org)
• Architect and community liaison at IBM – Spark Technology Center
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Spark, Apache Toree among other projects related to Apache Spark ecosystem
2
@lresende1975 http://lresende.blogspot.com/ https://www.linkedin.com/in/lresendehttp://slideshare.net/luckbr1975lresende

IBM Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com
Key statistics:
About 40 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now a top level Apache project !
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
3

Jupyter Notebook Platform Architecture Overview
• Notebook UI runs on the browser
• The Notebook Server serves the ’Notebooks’
• Kernels interpret/execute cell contents
• Are responsible for code execution
• Abstracts different languages
4

Enterprise/Cloud Analytics Platform Characteristics
Large pool of shared computing resources
• Enterprise Cloud, Public Cloud or Hybrid
Distributed Consumers
• Notebooks running local
• Notebooks as services
Different Resource Utilization Patterns
• High number of idle resources
5

Analytics Platform – Current state of the art
Open Source Jupyter based Notebook Platform
• Single User sharing the same distributed filesystem and privileges
• Resources are limited by what is available on the one single node that runs all Kernels
and associated Spark drivers.
• No security, users can see and control each others process using Jupyter’s administration
utilities.
6

Analytics Platform Today – Shared Cluster
Allows Jupyter notebooks running outside of the
cluster to run Jupyter kernels inside the cluster
sharing it’s resources.
• All Jupyter kernels run under a shared, “service” user ID.
• Users can see and control each others’ kernels using
Jupyter’s administration utilities.
• All kernels and their associated Spark drivers run on a
single (configurable) node of the cluster.
7
Spark Cluster
Bob’s Desktop
Multiple Notebooks
Alice’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Jupyter
Notebook
Server
(with NB2KG)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Executors
(as Alice)Executors
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
YARN
Workers
Security
Layer
Jupyter
Notebook
Server
(with NB2KG)
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)

Analytics Platform Today – Single User Cluster
Allows Jupyter notebooks running outside of the
cluster to run Jupyter kernels in a cluster created
specially to the user.
• Expensive as clusters are created for every individual
user
8
Spark Cluster
Bob’s Desktop
Multiple Notebooks
Jupyter
Kernel
Gateway
Jupyter
Notebook
Server
(with NB2KG)
Executors
(as Alice)Executors
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
YARN
Workers
Spark Cluster
Alice’s Desktop
Multiple Notebooks
Jupyter
Kernel
Gateway
Executors
(as Alice)Executors
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
YARN
Workers
Jupyter
Notebook
Server
(with NB2KG)

Extended Jupyter Kernel Gateway
Notebook Platform based on Jupyter stack aiming on Enterprise/Cloud
requirements and use cases
9

Extended Jupyter Kernel Gateway – Goals
Optimized Resource Allocation
•Run Spark in YARN Cluster Mode to better utilize cluster resources.
•Pluggable architecture for additional Resource Managers
Enhanced Security
•Enable TLS for all socket communications
•Any HTTP communication should be encrypted (SSL)
Multiuser support with user impersonation
•Enhance security and sandboxing by enabling user impersonation when running kernels.
•Individual HDFS home folder for each notebook user.
•Use the same user ID for notebook and batch jobs.
10

Extending Jupyter Kernel Gateway
• Enable running kernels remotely in a cluster
• Pluggable kernel lifecycle management
• Enhanced security
• Multiuser leveraging Kerberos
user impersonation
11
Extended Jupyter Kernel
Gateway
Jupyter Notebook Server

Spark Cluster
12
Security
Layer
Alice’s Desktop
Multiple Notebooks
Jupyter
Notebook
Server
(with NB2KG)
YARN
Workers
Jupyter REST API
User Session Manager
Kernel Lyfecycle
Kernel Communication (local/remote)
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Bob’s Desktop
Multiple Notebooks
Jupyter
Notebook
Server
(with NB2KG)
Impersonation:
Alice’s kernel
runs under
Alice’s user ID.

Extended Jupyter Kernel Gateway – Resource Managers
Pluggable Resource Management support
•Priority to provide support for Yarn resource
manager running kernels in cluster mode
•Easily add support for different resource manager
(e.g. kubernetes)
13
BaseProcessProxy
DistributedProcessProxy YarnClusterProcessProxy …

Stay tuned, we are becoming open source very soon!!!
Are you considering being an early adopter, please contact me at
lresende@apache.org or lresende@us.ibm.com !!!
14

Jupyter con meetup extended jupyter kernel gateway

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Jupyter con meetup extended jupyter kernel gateway

Similar to Jupyter con meetup extended jupyter kernel gateway (20)

More from Luciano Resende

More from Luciano Resende (20)

Recently uploaded

Recently uploaded (20)

Jupyter con meetup extended jupyter kernel gateway