SlideShare a Scribd company logo
IBM SparkTechnology Center
Graph Technologies Meetup
Building Enterprise/Cloud Analytics Platform
with Jupyter Notebooks and Apache Spark
Luciano Resende
IBM | Spark Technology Center
IBM SparkTechnology Center
About Me
Luciano Resende (lresende@apache.org)
• Architect and community liaison at IBM – Spark Technology Center
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Spark, Apache Toree among other projects related to Apache Spark ecosystem
2
@lresende1975 http://lresende.blogspot.com/ https://www.linkedin.com/in/lresendehttp://slideshare.net/luckbr1975lresende
IBM SparkTechnology Center
IBM Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com
Key statistics:
About 40 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now a top level Apache project !
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
3
IBM SparkTechnology Center
Jupyter Notebook Platform Architecture Overview
• Notebook UI runs on the browser
• The Notebook Server serves the ’Notebooks’
• Kernels interpret/execute cell contents
• Are responsible for code execution
• Abstracts different languages
4
IBM SparkTechnology Center
Enterprise/Cloud Analytics Platform Characteristics
Large pool of shared computing resources
• Enterprise Cloud, Public Cloud or Hybrid
Distributed Consumers
• Notebooks running local
• Notebooks as services
Different Resource Utilization Patterns
• High number of idle resources
5
IBM SparkTechnology Center
Analytics Platform – Current state of the art
Open Source Jupyter based Notebook Platform
• Single User sharing the same distributed filesystem and privileges
• Resources are limited by what is available on the one single node that runs all Kernels
and associated Spark drivers.
• No security, users can see and control each others process using Jupyter’s administration
utilities.
6
IBM SparkTechnology Center
Analytics Platform Today – Shared Cluster
Allows Jupyter notebooks running outside of the
cluster to run Jupyter kernels inside the cluster
sharing it’s resources.
• All Jupyter kernels run under a shared, “service” user ID.
• Users can see and control each others’ kernels using
Jupyter’s administration utilities.
• All kernels and their associated Spark drivers run on a
single (configurable) node of the cluster.
7
Spark Cluster
Bob’s Desktop
Multiple Notebooks
Alice’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Jupyter
Notebook
Server
(with NB2KG)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
YARN
Workers
Security
Layer
Jupyter
Notebook
Server
(with NB2KG)
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
IBM SparkTechnology Center
Analytics Platform Today – Single User Cluster
Allows Jupyter notebooks running outside of the
cluster to run Jupyter kernels in a cluster created
specially to the user.
• Expensive as clusters are created for every individual
user
8
Spark Cluster
Bob’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Jupyter
Notebook
Server
(with NB2KG)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
YARN
Workers
Spark Cluster
Alice’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Kernel
[Spark Driver]
(yarn-client
mode as JNBG
Service User)
YARN
Workers
Jupyter
Notebook
Server
(with NB2KG)
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway
Notebook Platform based on Jupyter stack aiming on Enterprise/Cloud
requirements and use cases
9
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway – Goals
Optimized Resource Allocation
•Run Spark in YARN Cluster Mode to better utilize cluster resources.
•Pluggable architecture for additional Resource Managers
Enhanced Security
•Enable TLS for all socket communications
•Any HTTP communication should be encrypted (SSL)
Multiuser support with user impersonation
•Enhance security and sandboxing by enabling user impersonation when running kernels.
•Individual HDFS home folder for each notebook user.
•Use the same user ID for notebook and batch jobs.
10
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway
Extending Jupyter Kernel Gateway
• Enable running kernels remotely in a cluster
• Pluggable kernel lifecycle management
• Enhanced security
• Multiuser leveraging Kerberos
user impersonation
11
Extended Jupyter Kernel
Gateway
Jupyter Kernel Gateway
Jupyter Notebook Server
IBM SparkTechnology Center
Spark Cluster
Extended Jupyter Kernel Gateway
12
Security
Layer
Alice’s Desktop
Multiple Notebooks
Jupyter
Notebook
Server
(with NB2KG)
YARN
Workers
Extended Jupyter Kernel Gateway
Jupyter REST API
User Session Manager
Kernel Lyfecycle
Kernel Communication (local/remote)
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Bob’s Desktop
Multiple Notebooks
Jupyter
Notebook
Server
(with NB2KG)
Impersonation:
Alice’s kernel
runs under
Alice’s user ID.
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway – Resource Managers
Pluggable Resource Management support
•Priority to provide support for Yarn resource
manager running kernels in cluster mode
•Easily add support for different resource manager
(e.g. kubernetes)
13
BaseProcessProxy
DistributedProcessProxy YarnClusterProcessProxy …
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway
Stay tuned, we are becoming open source very soon!!!
Are you considering being an early adopter, please contact me at
lresende@apache.org or lresende@us.ibm.com !!!
14

More Related Content

What's hot

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
Owen O'Malley
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
Dani Traphagen
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
Roy Gilad
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
BlueData, Inc.
 
An Introduction to OpenStack
An Introduction to OpenStackAn Introduction to OpenStack
An Introduction to OpenStack
Scott Lowe
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud Journey
Cloud Native Day Tel Aviv
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
Matt Ray
 
Openstack lab environment Virtualbox (English)
Openstack lab environment Virtualbox (English)Openstack lab environment Virtualbox (English)
Openstack lab environment Virtualbox (English)
Abderrahmane TEKFI
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
EDB
 
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera, Inc.
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTak
Ram Kishor Tak
 
Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015
Tomasz Zen Napierala
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
Sean Roberts
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStack
Abderrahmane TEKFI
 
Mastering OpenStack - Episode 13 - Network Design
Mastering OpenStack - Episode 13 - Network DesignMastering OpenStack - Episode 13 - Network Design
Mastering OpenStack - Episode 13 - Network Design
Roozbeh Shafiee
 

What's hot (20)

Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
 
Apache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CITApache Geode Meetup, Cork, Ireland at CIT
Apache Geode Meetup, Cork, Ireland at CIT
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
An Introduction to OpenStack
An Introduction to OpenStackAn Introduction to OpenStack
An Introduction to OpenStack
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud Journey
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 
Openstack lab environment Virtualbox (English)
Openstack lab environment Virtualbox (English)Openstack lab environment Virtualbox (English)
Openstack lab environment Virtualbox (English)
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Hadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTakHadoop-Automation-Tool_RamkishorTak
Hadoop-Automation-Tool_RamkishorTak
 
Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015Architecture of massively scalable, distributed systems - InfoShare 2015
Architecture of massively scalable, distributed systems - InfoShare 2015
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Introduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStackIntroduction to Orchestration and DevOps with OpenStack
Introduction to Orchestration and DevOps with OpenStack
 
Mastering OpenStack - Episode 13 - Network Design
Mastering OpenStack - Episode 13 - Network DesignMastering OpenStack - Episode 13 - Network Design
Mastering OpenStack - Episode 13 - Network Design
 

Similar to Jupyter con meetup extended jupyter kernel gateway

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
Luciano Resende
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
Luciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Luciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
Luciano Resende
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Codemotion
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
Luciano Resende
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
Chester Chen
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
DataWorks Summit
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
openstackindia
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Sri Ambati
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
DataWorks Summit
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introduction
Muddassir Nazir
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
Peter Clapham
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
Peter Clapham
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
Stitch Fix Algorithms
 
SparkFramework
SparkFrameworkSparkFramework
SparkFramework
Sergio Viademonte.
 
Shannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco IntroShannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco Intro
Shannon McFarland
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
Deepak Garg
 

Similar to Jupyter con meetup extended jupyter kernel gateway (20)

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
Data processing at the speed of 100 Gbps@Apache Crail (Incubating)
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introduction
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
 
SparkFramework
SparkFrameworkSparkFramework
SparkFramework
 
Shannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco IntroShannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco Intro
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
 

More from Luciano Resende

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Luciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
Luciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
Luciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
Luciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
Luciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
Luciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
Luciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Luciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
Luciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
Luciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
Luciano Resende
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
Luciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
Luciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
Luciano Resende
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
Luciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
Luciano Resende
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
Luciano Resende
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
Luciano Resende
 

More from Luciano Resende (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 

Recently uploaded

Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
Vineet
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
nhero3888
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 

Recently uploaded (20)

Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Senior Software Profiles Backend Sample - Sheet1.pdf
Senior Software Profiles  Backend Sample - Sheet1.pdfSenior Software Profiles  Backend Sample - Sheet1.pdf
Senior Software Profiles Backend Sample - Sheet1.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Bangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts ServiceBangalore ℂall Girl 000000 Bangalore Escorts Service
Bangalore ℂall Girl 000000 Bangalore Escorts Service
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 

Jupyter con meetup extended jupyter kernel gateway

  • 1. IBM SparkTechnology Center Graph Technologies Meetup Building Enterprise/Cloud Analytics Platform with Jupyter Notebooks and Apache Spark Luciano Resende IBM | Spark Technology Center
  • 2. IBM SparkTechnology Center About Me Luciano Resende (lresende@apache.org) • Architect and community liaison at IBM – Spark Technology Center • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Spark, Apache Toree among other projects related to Apache Spark ecosystem 2 @lresende1975 http://lresende.blogspot.com/ https://www.linkedin.com/in/lresendehttp://slideshare.net/luckbr1975lresende
  • 3. IBM SparkTechnology Center IBM Spark Technology Center Founded in 2015. Location: Physical: 505 Howard St., San Francisco CA Web: http://spark.tc Twitter: @apachespark_tc Mission: Contribute intellectual and technical capital to the Apache Spark community. Make the core technology enterprise- and cloud-ready. Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com Key statistics: About 40 developers, co-located with 25 IBM designers. Major contributions to Apache Spark http://jiras.spark.tc Apache SystemML is now a top level Apache project ! Founding member of UC Berkeley AMPLab and RISE Lab Member of R Consortium and Scala Center 3
  • 4. IBM SparkTechnology Center Jupyter Notebook Platform Architecture Overview • Notebook UI runs on the browser • The Notebook Server serves the ’Notebooks’ • Kernels interpret/execute cell contents • Are responsible for code execution • Abstracts different languages 4
  • 5. IBM SparkTechnology Center Enterprise/Cloud Analytics Platform Characteristics Large pool of shared computing resources • Enterprise Cloud, Public Cloud or Hybrid Distributed Consumers • Notebooks running local • Notebooks as services Different Resource Utilization Patterns • High number of idle resources 5
  • 6. IBM SparkTechnology Center Analytics Platform – Current state of the art Open Source Jupyter based Notebook Platform • Single User sharing the same distributed filesystem and privileges • Resources are limited by what is available on the one single node that runs all Kernels and associated Spark drivers. • No security, users can see and control each others process using Jupyter’s administration utilities. 6
  • 7. IBM SparkTechnology Center Analytics Platform Today – Shared Cluster Allows Jupyter notebooks running outside of the cluster to run Jupyter kernels inside the cluster sharing it’s resources. • All Jupyter kernels run under a shared, “service” user ID. • Users can see and control each others’ kernels using Jupyter’s administration utilities. • All kernels and their associated Spark drivers run on a single (configurable) node of the cluster. 7 Spark Cluster Bob’s Desktop Multiple Notebooks Alice’s Desktop Multiple Notebooks Jupyter Kernel Gateway (Sandboxed by service user privileges) Jupyter Kernel Gateway Jupyter Notebook Server (with NB2KG) Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Kernel [Spark Driver] (yarn-client mode as JNBG Service User) YARN Workers Security Layer Jupyter Notebook Server (with NB2KG) Kernel [Spark Driver] (yarn-client mode as JNBG Service User)
  • 8. IBM SparkTechnology Center Analytics Platform Today – Single User Cluster Allows Jupyter notebooks running outside of the cluster to run Jupyter kernels in a cluster created specially to the user. • Expensive as clusters are created for every individual user 8 Spark Cluster Bob’s Desktop Multiple Notebooks Jupyter Kernel Gateway (Sandboxed by service user privileges) Jupyter Kernel Gateway Jupyter Notebook Server (with NB2KG) Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Kernel [Spark Driver] (yarn-client mode as JNBG Service User) YARN Workers Spark Cluster Alice’s Desktop Multiple Notebooks Jupyter Kernel Gateway (Sandboxed by service user privileges) Jupyter Kernel Gateway Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Kernel [Spark Driver] (yarn-client mode as JNBG Service User) YARN Workers Jupyter Notebook Server (with NB2KG)
  • 9. IBM SparkTechnology Center Extended Jupyter Kernel Gateway Notebook Platform based on Jupyter stack aiming on Enterprise/Cloud requirements and use cases 9
  • 10. IBM SparkTechnology Center Extended Jupyter Kernel Gateway – Goals Optimized Resource Allocation •Run Spark in YARN Cluster Mode to better utilize cluster resources. •Pluggable architecture for additional Resource Managers Enhanced Security •Enable TLS for all socket communications •Any HTTP communication should be encrypted (SSL) Multiuser support with user impersonation •Enhance security and sandboxing by enabling user impersonation when running kernels. •Individual HDFS home folder for each notebook user. •Use the same user ID for notebook and batch jobs. 10
  • 11. IBM SparkTechnology Center Extended Jupyter Kernel Gateway Extending Jupyter Kernel Gateway • Enable running kernels remotely in a cluster • Pluggable kernel lifecycle management • Enhanced security • Multiuser leveraging Kerberos user impersonation 11 Extended Jupyter Kernel Gateway Jupyter Kernel Gateway Jupyter Notebook Server
  • 12. IBM SparkTechnology Center Spark Cluster Extended Jupyter Kernel Gateway 12 Security Layer Alice’s Desktop Multiple Notebooks Jupyter Notebook Server (with NB2KG) YARN Workers Extended Jupyter Kernel Gateway Jupyter REST API User Session Manager Kernel Lyfecycle Kernel Communication (local/remote) Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Bob’s Desktop Multiple Notebooks Jupyter Notebook Server (with NB2KG) Impersonation: Alice’s kernel runs under Alice’s user ID.
  • 13. IBM SparkTechnology Center Extended Jupyter Kernel Gateway – Resource Managers Pluggable Resource Management support •Priority to provide support for Yarn resource manager running kernels in cluster mode •Easily add support for different resource manager (e.g. kubernetes) 13 BaseProcessProxy DistributedProcessProxy YarnClusterProcessProxy …
  • 14. IBM SparkTechnology Center Extended Jupyter Kernel Gateway Stay tuned, we are becoming open source very soon!!! Are you considering being an early adopter, please contact me at lresende@apache.org or lresende@us.ibm.com !!! 14