SlideShare a Scribd company logo
1 of 19
Download to read offline
Jupyter
Enterprise Gateway
CODAIT Development Team
August, 2018
© 2018 IBM Corporation 1
Jupyter Notebooks
Overview
5© 2018 IBM Corporation
Jupyter Notebooks
© 2018 IBM Corporation 6
Notebooks are interactive
computational
environments, in which
you can combine code
execution, rich text,
mathematics, plots and
rich media.
Jupyter Notebooks
© 2018 IBM Corporation 7
• Notebook UI runs on the browser
• The Notebook Server serves the
’Notebooks’
– ipynb files
• Kernels interpret/execute cell contents
– Are responsible for code execution
– Abstracts different languages
Building a
Data Science
Analytical Platform
9© 2018 IBM Corporation
Building an Data Science Platform
© 2018 IBM Corporation
Large pool of shared computing resources
• Enterprise Cloud, Public Cloud or Hybrid
• Data in the cloud (Data Lakes/Object Storage)
Distributed Consumers
• Notebooks running local (users laptop)
or as a service (e.g. Jupyter Hub)
Different Resource Utilization Patterns
• High number of idle resources
Limitations of Jupyter Notebook Stack
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
11
8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
• Scalability
• Jupyter Kernels running as local process
• Resources are limited by what is available
on the one single node that runs all Kernels
and associated Spark drivers
• Security
• Single user sharing the same privileges
• Users can see and control each other process
using Jupyter administrative utilities
Kernel
Kernel
Kernel
Kernel
Kernel
12© 2018 IBM Corporation
Jupyter Enterprise Gateway
13© 2018 IBM Corporation
Jupyter Enterprise
Gateway
© 2018 IBM Corporation
Jupyter Enterprise Gateway at IBM Code
https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Supported Kernels
Supported Platforms
14
A lightweight, multi-tenant, scalable
and secure gateway that enables
Jupyter Notebooks to share resources
across an Apache Spark or Kubernetes
cluster for Enterprise/Cloud use cases
Spectrum Conductor
+
Jupyter Enterprise Gateway Features
© 2018 IBM Corporation
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
15
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap)
Cluster Size (32GB Nodes)
MAXIMUM NUMBER OF
SIMULTANEOUS KERNELS
Optimized Resource Allocation
– Utilize resources on all cluster nodes by running kernels as Spark
applications in YARN Cluster Mode.
– Pluggable architecture to enable support for additional Resource Managers
Enhanced Security
– End-to-End secure communications
• Secure socket communications
• Encrypted HTTP communication using SSL
Multiuser support with user impersonation
– Enhance security and sandboxing by enabling user impersonation when
running kernels (using Kerberos).
– Individual HDFS home folder for each notebook user.
– Use the same user ID for notebook and batch jobs.
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Kernel
Jupyter Enterprise Gateway – YARN
© 2018 IBM Corporation 16
YARN Cluster
YARN
Workers
Gateway Node
Jupyter Enterprise Gateway
• Multitenancy
• Remote kernel lifecycle management via process proxies
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Impersonation:
Alice’s kernel runs
under Alice’s user ID.
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
SecurityLayer
nb2kg
nb2kg
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Bob
Alice
Enterprise Gateway & Kubernetes
© 2018 IBM Corporation
Supported Platforms
FfDL
Before Enterprise Gateway After Enterprise Gateway
Before Jupyter Enterprise Gateway …
• Resources required for all kernels needs to
be allocated during Notebook Server pod
creation
• Resources limited to what is physically
available on the host node that runs all
kernels and associated Spark drivers
After Jupyter Enterprise Gateway …
• Gateway pod very lightweight
• Kernels in their own pod, isolation
• Kernel pods built from community images:
Spark-on-K8s, TensorFlow, Keras, etc.
Jupyter Enterprise Gateway - Kubernetes
© 2018 IBM Corporation 18
Container images defined in kernelspec
Community image
Kernel
Spark on K8
Kernel
Distributed
File
System
Vanilla Kernels
Spark based kernels
Gateway
nb2kg
nb2kg
The Secret Sauce:
Process Proxies mixed with Kernel Launchers
© 2018 IBM Corporation
Process Proxy:
• Abstracts kernel process represented by Jupyter
framework
• Pluggable class definition identified in kernelspec
(kernel.json)
• Manages kernel lifecycle
Kernel Launcher:
• Embeds target kernel
• Listens on gateway communication port
• Conveys interrupt requests (via local signal)
• Could be extended for additional communications
{
"language": "python",
"display_name": "Spark - Python (Kubernetes Mode)",
"process_proxy": {
"class_name":
"enterprise_gateway.services.processproxies.k8s.KubernetesProcessP
roxy",
"config": {
"image_name": "elyra/kubernetes-kernel-py:dev",
"executor_image_name": "elyra/kubernetes-kernel-py:dev”,
"port_range" : "40000..42000"
}
},
"env": {
"SPARK_HOME": "/opt/spark",
"SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST
--deploy-mode cluster --name …",
…
},
"argv": [
"/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run.
sh",
"{connection_file}",
"--RemoteProcessProxy.response-address",
"{response_address}",
"--RemoteProcessProxy.spark-context-initialization-mode",
"lazy"
]
}
Deployment
Utilities
21© 2018 IBM Corporation
Enterprise Gateway Deployment
© 2018 IBM Corporation 22
Ansible deployment scripts
• https://github.com/lresende/spark-cluster-install
One click deployment of the Apache Spark
• Configure your host inventory (see example on git
repository)
• Run the ”setup-ambari.yml” playbook
• $ ansible-playbook --verbose setup-ambari.yml -i
hosts-fyre-ambari -c paramiko
One click deployment of Enterprise Gateway
• Run the ”setup-enterprise-gateway.yml” playbook
• $ ansible-playbook --verbose setup-enterprise-
gateway.yml -i hosts-fyre-ambari -c paramiko
Management Node
Powered by AmbariEG
Enterprise Gateway Deployment
© 2018 IBM Corporation 23
Docker images
• yarn-spark: Basic one node Spark on Yarn configuration
• enterprise-gateway: Adds Anaconda and Jupyter Enterprise Gateway to
the yarn-spark image
• nb2kg: Minimal Jupyter Notebook client configured with hooks to access
the Enterprise Gateway
• https://github.com/jupyter-
incubator/enterprise_gateway/tree/master/etc/docker
Building the latest docker images
• git checkout https://github.com/jupyter-incubator/enterprise_gateway
• make docker-clean docker-images
Note: Make also have individual targets to clean and build individual images
(type make for help)
Spark on YARNEG
Enterprise Gateway Deployment
© 2018 IBM Corporation 24
Connecting to Enterprise Gateway using
Notebook docker image
docker run -t --rm 
-e KG_URL='http://<Enterprise Gateway IP>:8888' 
-p 8888:8888 
-e VALIDATE_KG_CERT='no' 
-e LOG_LEVEL=DEBUG 
-e KG_REQUEST_TIMEOUT=40 
-e KG_CONNECT_TIMEOUT=40 
-v ${HOME}/opensource/jupyter/jupyter-notebooks/:/tmp/notebooks 
-w /tmp/notebooks 
elyra/nb2kg:dev
Spark on YARNEG

More Related Content

What's hot

KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...
KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...
KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...Steve Wong
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Yong Feng
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Building distribution packages with Docker
Building distribution packages with DockerBuilding distribution packages with Docker
Building distribution packages with DockerBruno Cornec
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Bruno Cornec
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...Animesh Singh
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDataWorks Summit
 
Construire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeConstruire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeOpen Source Experience
 
Developing and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateDeveloping and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateShikha Srivastava
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"IBM France Lab
 
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceIbis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceWes McKinney
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQLEDB
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 

What's hot (20)

KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...
KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...
KubeCon China June 2019 - Survey of Kubernetes related solutions for IoT and ...
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
 
Moby KubeCon 2017
Moby KubeCon 2017Moby KubeCon 2017
Moby KubeCon 2017
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Building distribution packages with Docker
Building distribution packages with DockerBuilding distribution packages with Docker
Building distribution packages with Docker
 
Watson on bluemix
Watson on bluemixWatson on bluemix
Watson on bluemix
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
KarthikSNOW_CV
KarthikSNOW_CVKarthikSNOW_CV
KarthikSNOW_CV
 
Considering Bare Metal
Considering Bare MetalConsidering Bare Metal
Considering Bare Metal
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhereDocker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Construire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edgeConstruire une « data fabric » pour les environnements edge
Construire une « data fabric » pour les environnements edge
 
Developing and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateDeveloping and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud Private
 
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
20190613 - IBM Cloud Côte d'Azur meetup - "Cloud & Containers"
 
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceIbis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 

Similar to Jupyter Enterprise Gateway Overview

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)DataWorks Summit
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowWes McKinney
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a ServiceGeoffrey Fox
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageGlusterFS
 
Storage as a service OpenStack
Storage as a service OpenStackStorage as a service OpenStack
Storage as a service OpenStackopenstackindia
 
Deep Learning with Apache MXNet
Deep Learning with Apache MXNetDeep Learning with Apache MXNet
Deep Learning with Apache MXNetJulien SIMON
 

Similar to Jupyter Enterprise Gateway Overview (20)

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache ArrowNext-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
 
Storage as a service OpenStack
Storage as a service OpenStackStorage as a service OpenStack
Storage as a service OpenStack
 
Deep Learning with Apache MXNet
Deep Learning with Apache MXNetDeep Learning with Apache MXNet
Deep Learning with Apache MXNet
 

More from Luciano Resende

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overviewLuciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitionsLuciano Resende
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceLuciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSLuciano Resende
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscanyLuciano Resende
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyLuciano Resende
 

More from Luciano Resende (20)

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
 

Recently uploaded

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 

Recently uploaded (20)

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 

Jupyter Enterprise Gateway Overview

  • 1. Jupyter Enterprise Gateway CODAIT Development Team August, 2018 © 2018 IBM Corporation 1
  • 3. Jupyter Notebooks © 2018 IBM Corporation 6 Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media.
  • 4. Jupyter Notebooks © 2018 IBM Corporation 7 • Notebook UI runs on the browser • The Notebook Server serves the ’Notebooks’ – ipynb files • Kernels interpret/execute cell contents – Are responsible for code execution – Abstracts different languages
  • 5. Building a Data Science Analytical Platform 9© 2018 IBM Corporation
  • 6. Building an Data Science Platform © 2018 IBM Corporation Large pool of shared computing resources • Enterprise Cloud, Public Cloud or Hybrid • Data in the cloud (Data Lakes/Object Storage) Distributed Consumers • Notebooks running local (users laptop) or as a service (e.g. Jupyter Hub) Different Resource Utilization Patterns • High number of idle resources
  • 7. Limitations of Jupyter Notebook Stack © 2018 IBM Corporation Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow 11 8 8 8 8 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS • Scalability • Jupyter Kernels running as local process • Resources are limited by what is available on the one single node that runs all Kernels and associated Spark drivers • Security • Single user sharing the same privileges • Users can see and control each other process using Jupyter administrative utilities Kernel Kernel Kernel Kernel Kernel
  • 8. 12© 2018 IBM Corporation
  • 9. Jupyter Enterprise Gateway 13© 2018 IBM Corporation
  • 10. Jupyter Enterprise Gateway © 2018 IBM Corporation Jupyter Enterprise Gateway at IBM Code https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Supported Kernels Supported Platforms 14 A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases Spectrum Conductor +
  • 11. Jupyter Enterprise Gateway Features © 2018 IBM Corporation Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow 15 16 32 48 64 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 NodesMaxKernels(4GBHeap) Cluster Size (32GB Nodes) MAXIMUM NUMBER OF SIMULTANEOUS KERNELS Optimized Resource Allocation – Utilize resources on all cluster nodes by running kernels as Spark applications in YARN Cluster Mode. – Pluggable architecture to enable support for additional Resource Managers Enhanced Security – End-to-End secure communications • Secure socket communications • Encrypted HTTP communication using SSL Multiuser support with user impersonation – Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos). – Individual HDFS home folder for each notebook user. – Use the same user ID for notebook and batch jobs. Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel
  • 12. Jupyter Enterprise Gateway – YARN © 2018 IBM Corporation 16 YARN Cluster YARN Workers Gateway Node Jupyter Enterprise Gateway • Multitenancy • Remote kernel lifecycle management via process proxies Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Impersonation: Alice’s kernel runs under Alice’s user ID. Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver SecurityLayer nb2kg nb2kg Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Bob Alice
  • 13. Enterprise Gateway & Kubernetes © 2018 IBM Corporation Supported Platforms FfDL Before Enterprise Gateway After Enterprise Gateway Before Jupyter Enterprise Gateway … • Resources required for all kernels needs to be allocated during Notebook Server pod creation • Resources limited to what is physically available on the host node that runs all kernels and associated Spark drivers After Jupyter Enterprise Gateway … • Gateway pod very lightweight • Kernels in their own pod, isolation • Kernel pods built from community images: Spark-on-K8s, TensorFlow, Keras, etc.
  • 14. Jupyter Enterprise Gateway - Kubernetes © 2018 IBM Corporation 18 Container images defined in kernelspec Community image Kernel Spark on K8 Kernel Distributed File System Vanilla Kernels Spark based kernels Gateway nb2kg nb2kg
  • 15. The Secret Sauce: Process Proxies mixed with Kernel Launchers © 2018 IBM Corporation Process Proxy: • Abstracts kernel process represented by Jupyter framework • Pluggable class definition identified in kernelspec (kernel.json) • Manages kernel lifecycle Kernel Launcher: • Embeds target kernel • Listens on gateway communication port • Conveys interrupt requests (via local signal) • Could be extended for additional communications { "language": "python", "display_name": "Spark - Python (Kubernetes Mode)", "process_proxy": { "class_name": "enterprise_gateway.services.processproxies.k8s.KubernetesProcessP roxy", "config": { "image_name": "elyra/kubernetes-kernel-py:dev", "executor_image_name": "elyra/kubernetes-kernel-py:dev”, "port_range" : "40000..42000" } }, "env": { "SPARK_HOME": "/opt/spark", "SPARK_OPTS": "--master k8s://https://${KUBERNETES_SERVICE_HOST --deploy-mode cluster --name …", … }, "argv": [ "/usr/local/share/jupyter/kernels/spark_python_kubernetes/bin/run. sh", "{connection_file}", "--RemoteProcessProxy.response-address", "{response_address}", "--RemoteProcessProxy.spark-context-initialization-mode", "lazy" ] }
  • 17. Enterprise Gateway Deployment © 2018 IBM Corporation 22 Ansible deployment scripts • https://github.com/lresende/spark-cluster-install One click deployment of the Apache Spark • Configure your host inventory (see example on git repository) • Run the ”setup-ambari.yml” playbook • $ ansible-playbook --verbose setup-ambari.yml -i hosts-fyre-ambari -c paramiko One click deployment of Enterprise Gateway • Run the ”setup-enterprise-gateway.yml” playbook • $ ansible-playbook --verbose setup-enterprise- gateway.yml -i hosts-fyre-ambari -c paramiko Management Node Powered by AmbariEG
  • 18. Enterprise Gateway Deployment © 2018 IBM Corporation 23 Docker images • yarn-spark: Basic one node Spark on Yarn configuration • enterprise-gateway: Adds Anaconda and Jupyter Enterprise Gateway to the yarn-spark image • nb2kg: Minimal Jupyter Notebook client configured with hooks to access the Enterprise Gateway • https://github.com/jupyter- incubator/enterprise_gateway/tree/master/etc/docker Building the latest docker images • git checkout https://github.com/jupyter-incubator/enterprise_gateway • make docker-clean docker-images Note: Make also have individual targets to clean and build individual images (type make for help) Spark on YARNEG
  • 19. Enterprise Gateway Deployment © 2018 IBM Corporation 24 Connecting to Enterprise Gateway using Notebook docker image docker run -t --rm -e KG_URL='http://<Enterprise Gateway IP>:8888' -p 8888:8888 -e VALIDATE_KG_CERT='no' -e LOG_LEVEL=DEBUG -e KG_REQUEST_TIMEOUT=40 -e KG_CONNECT_TIMEOUT=40 -v ${HOME}/opensource/jupyter/jupyter-notebooks/:/tmp/notebooks -w /tmp/notebooks elyra/nb2kg:dev Spark on YARNEG