SlideShare a Scribd company logo
1 of 20
Download to read offline
IBM SparkTechnology Center
Big Analytics Meetup
Building Enterprise/Cloud Analytics Platform
with Jupyter Notebooks and Apache Spark
Luciano Resende
IBM | Spark Technology Center
IBM SparkTechnology Center
About Me
Luciano Resende (lresende AT apache DOT org)
• Architect and community liaison at IBM – Spark Technology Center
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Spark, Apache Toree among other projects related to Apache Spark ecosystem
2
@lresende1975 http://lresende.blogspot.com/ https://www.linkedin.com/in/lresendehttp://slideshare.net/luckbr1975lresende
IBM SparkTechnology Center
IBM and Open Source
Open Source Community Leadership
Spark	Technology	
Center
Founding	Partner 188+	Project	Committers 77+	Projects
Key	Open	source	
steering	committee	
OSS	Advisory	Board
Open	Source
IBM SparkTechnology Center
The Spark Technology Center
IBM SparkTechnology Center
IBM Spark Technology Center
Founded in 2015.
Location:
Physical: 505 Howard St., San Francisco CA
Web: http://spark.tc Twitter: @apachespark_tc
Mission:
Contribute intellectual and technical capital to the Apache Spark community.
Make the core technology enterprise- and cloud-ready.
Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com
Key statistics:
About 40 developers, co-located with 25 IBM designers.
Major contributions to Apache Spark http://jiras.spark.tc
Apache SystemML is now a top level Apache project !
Founding member of UC Berkeley AMPLab and RISE Lab
Member of R Consortium and Scala Center
6
IBM SparkTechnology Center
Focus on meaningful code contributions across
all major Spark projects
863 code contributions (JIRAs) and counting –
Check out http://jiras.spark.tc
Over 422 commits in Spark 2.0 , and
continuing major contributions in 2.x
Contributions by the Spark Technology Center
across almost all components of Spark
— Spark Core, SparkR, SQL, MLlib,
Streaming, PySpark, build and infrastructure,
etc
STC	impact	on	community	
46,385	Spark	LOC
863	Spark	JIRAs
457	SystemML JIRAs
67	Speakers	at	Events
Apache Spark Contributions
IBM SparkTechnology Center
Project Focus Areas
SQL
TPC-DS and Performance
Query Pushdown/Federation
Machine Learning
Spark MLLib
R4ML
Online Retraining
Apache Arrow
SystemML
Deep Learning
Consumability
Reference architectures
Spark Notebook stack
Spark Resource optimization
Spark Web UI
Apache Bahir
RedRock
Immersive Insights
IBM SparkTechnology Center
Spark Notebook Stack
based on Jupyter Notebooks
IBM SparkTechnology Center
Jupyter Notebook Platform Architecture Overview
• Notebook UI runs on the browser
• The Notebook Server serves the ’Notebooks’
• Kernels interpret/execute cell contents
• Are responsible for code execution
• Abstracts different languages
10
IBM SparkTechnology Center
Enterprise/Cloud Analytics Platform Characteristics
Large pool of shared computing resources
• Enterprise Cloud, Public Cloud or Hybrid
• Data in the cloud (Data Lakes/Object Store)
Distributed Consumers
• Notebooks running local (users laptop) or as a service
Different Resource Utilization Patterns
• High number of idle resources
11
IBM SparkTechnology Center
Analytics Platform – Current state of the art
Open Source Jupyter based Notebook Platform
• Single User sharing the same distributed filesystem and privileges
• Jupyter Kernels running as local process
• Resources are limited by what is available on the one single node that runs all Kernels and associated Spark drivers.
• No security, users can see and control each others process using Jupyter’s administration
utilities.
12
IBM SparkTechnology Center
Analytics Platform Today – Shared Cluster
Allows Jupyter notebooks running outside of the
cluster to run Jupyter kernels inside the cluster
sharing it’s resources.
• All Jupyter kernels run under a shared, “service” user ID.
• Users can see and control each others’ kernels using
Jupyter’s administration utilities.
• All kernels and their associated Spark drivers run on a
single (configurable) node of the cluster.
13
Spark Cluster
Bob’s Desktop
Multiple Notebooks
Alice’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Jupyter
Notebook
Server
(with NB2KG)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Kernel
[Spark Driver]
(yarn-client mode
as JNBG Service
User)
YARN
Workers
Security
Layer
Jupyter
Notebook
Server
(with NB2KG)
Kernel
[Spark Driver]
(yarn-client mode
as JNBG Service
User)
IBM SparkTechnology Center
Analytics Platform Today – Single User Cluster
Allows Jupyter notebooks running outside of the
cluster to run Jupyter kernels in a cluster created
specially to the user.
• Expensive as clusters are created for every individual
user
14
Spark Cluster
Bob’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Jupyter
Notebook
Server
(with NB2KG)
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Kernel
[Spark Driver]
(yarn-client mode
as JNBG Service
User)
YARN
Workers
Spark Cluster
Alice’s Desktop
Multiple Notebooks
Jupyter Kernel Gateway
(Sandboxed by service user privileges)
Jupyter
Kernel
Gateway
Executors
(as Alice)Executors
(as Alice)Spark Executors
(as JNBG Service User)
Kernel
[Spark Driver]
(yarn-client mode
as JNBG Service
User)
YARN
Workers
Jupyter
Notebook
Server
(with NB2KG)
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway
Notebook Platform based on Jupyter stack aiming on Enterprise/Cloud
requirements and use cases
15
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway – Goals
Optimized Resource Allocation
•Run Spark in YARN Cluster Mode to better utilize cluster resources.
•Pluggable architecture for additional Resource Managers
Enhanced Security
•Enable TLS for all socket communications
•Any HTTP communication should be encrypted (SSL)
Multiuser support with user impersonation
•Enhance security and sandboxing by enabling user impersonation when running kernels.
•Individual HDFS home folder for each notebook user.
•Use the same user ID for notebook and batch jobs.
16
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway
Extending Jupyter Kernel Gateway
• Enable running kernels remotely in a cluster
• Pluggable kernel lifecycle management
• Enhanced security
• Multiuser leveraging user impersonation
17
Extended Jupyter Kernel Gateway
Jupyter Kernel Gateway
Jupyter Notebook Server
IBM SparkTechnology Center
Spark Cluster
Extended Jupyter Kernel Gateway
18
Security
Layer
Alice’s Desktop
Multiple Notebooks
Jupyter
Notebook
Server
(with NB2KG)
YARN
Workers
Extended Jupyter Kernel Gateway
Jupyter REST API
User Session Manager
Kernel Lyfecycle
Kernel Communication (local/remote)
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Bob’s Desktop
Multiple Notebooks
Jupyter
Notebook
Server
(with NB2KG)
Impersonation:
Alice’s kernel
runs under Alice’s
user ID.
IBM SparkTechnology Center
Extended Jupyter Kernel Gateway
Stay tuned, we are becoming open source very soon!!!
Are you considering being an early adopter, please contact me at
lresende AT us DOT ibm DOT com !!!
19
IBM SparkTechnology Center
Other Resources
IBM developerWorks Code IBM developerWorks Journeys
https://developer.ibm.com/code/ https://developer.ibm.com/code/journey/
20

More Related Content

What's hot

Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteDani Traphagen
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Janos Matyas
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemOwen O'Malley
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQLEDB
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyCloud Native Day Tel Aviv
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerYahoo Developer Network
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteDenis Magda
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloudSteve Loughran
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleJudy Nash
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
Designing OpenStack Architectures
Designing OpenStack ArchitecturesDesigning OpenStack Architectures
Designing OpenStack ArchitecturesMirantis
 

What's hot (20)

Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
 
Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with Isilon
 
Cloud Native PostgreSQL
Cloud Native PostgreSQLCloud Native PostgreSQL
Cloud Native PostgreSQL
 
Gregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud JourneyGregory Touretsky - Intel IT- Open Cloud Journey
Gregory Touretsky - Intel IT- Open Cloud Journey
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Farming hadoop in_the_cloud
Farming hadoop in_the_cloudFarming hadoop in_the_cloud
Farming hadoop in_the_cloud
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Spark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattleSpark on Azure HDInsight - spark meetup seattle
Spark on Azure HDInsight - spark meetup seattle
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
Designing OpenStack Architectures
Designing OpenStack ArchitecturesDesigning OpenStack Architectures
Designing OpenStack Architectures
 

Similar to Big analytics meetup - Extended Jupyter Kernel Gateway

An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsLuciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May MeetupDeepak Garg
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewLuciano Resende
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Sri Ambati
 
Shannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco IntroShannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco IntroShannon McFarland
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinderopenstackindia
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Mike Broberg
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introductionMuddassir Nazir
 

Similar to Big analytics meetup - Extended Jupyter Kernel Gateway (20)

An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Openstack India May Meetup
Openstack India May MeetupOpenstack India May Meetup
Openstack India May Meetup
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
 
Shannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco IntroShannon McFarland OpenStack/Cisco Intro
Shannon McFarland OpenStack/Cisco Intro
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Stackato
StackatoStackato
Stackato
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introduction
 

More from Luciano Resende

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksLuciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overviewLuciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitionsLuciano Resende
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceLuciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSLuciano Resende
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscanyLuciano Resende
 

More from Luciano Resende (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 

Recently uploaded

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 

Recently uploaded (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 

Big analytics meetup - Extended Jupyter Kernel Gateway

  • 1. IBM SparkTechnology Center Big Analytics Meetup Building Enterprise/Cloud Analytics Platform with Jupyter Notebooks and Apache Spark Luciano Resende IBM | Spark Technology Center
  • 2. IBM SparkTechnology Center About Me Luciano Resende (lresende AT apache DOT org) • Architect and community liaison at IBM – Spark Technology Center • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Spark, Apache Toree among other projects related to Apache Spark ecosystem 2 @lresende1975 http://lresende.blogspot.com/ https://www.linkedin.com/in/lresendehttp://slideshare.net/luckbr1975lresende
  • 4. Open Source Community Leadership Spark Technology Center Founding Partner 188+ Project Committers 77+ Projects Key Open source steering committee OSS Advisory Board Open Source
  • 5. IBM SparkTechnology Center The Spark Technology Center
  • 6. IBM SparkTechnology Center IBM Spark Technology Center Founded in 2015. Location: Physical: 505 Howard St., San Francisco CA Web: http://spark.tc Twitter: @apachespark_tc Mission: Contribute intellectual and technical capital to the Apache Spark community. Make the core technology enterprise- and cloud-ready. Build data science skills to drive intelligence into business applications — http://bigdatauniversity.com Key statistics: About 40 developers, co-located with 25 IBM designers. Major contributions to Apache Spark http://jiras.spark.tc Apache SystemML is now a top level Apache project ! Founding member of UC Berkeley AMPLab and RISE Lab Member of R Consortium and Scala Center 6
  • 7. IBM SparkTechnology Center Focus on meaningful code contributions across all major Spark projects 863 code contributions (JIRAs) and counting – Check out http://jiras.spark.tc Over 422 commits in Spark 2.0 , and continuing major contributions in 2.x Contributions by the Spark Technology Center across almost all components of Spark — Spark Core, SparkR, SQL, MLlib, Streaming, PySpark, build and infrastructure, etc STC impact on community 46,385 Spark LOC 863 Spark JIRAs 457 SystemML JIRAs 67 Speakers at Events Apache Spark Contributions
  • 8. IBM SparkTechnology Center Project Focus Areas SQL TPC-DS and Performance Query Pushdown/Federation Machine Learning Spark MLLib R4ML Online Retraining Apache Arrow SystemML Deep Learning Consumability Reference architectures Spark Notebook stack Spark Resource optimization Spark Web UI Apache Bahir RedRock Immersive Insights
  • 9. IBM SparkTechnology Center Spark Notebook Stack based on Jupyter Notebooks
  • 10. IBM SparkTechnology Center Jupyter Notebook Platform Architecture Overview • Notebook UI runs on the browser • The Notebook Server serves the ’Notebooks’ • Kernels interpret/execute cell contents • Are responsible for code execution • Abstracts different languages 10
  • 11. IBM SparkTechnology Center Enterprise/Cloud Analytics Platform Characteristics Large pool of shared computing resources • Enterprise Cloud, Public Cloud or Hybrid • Data in the cloud (Data Lakes/Object Store) Distributed Consumers • Notebooks running local (users laptop) or as a service Different Resource Utilization Patterns • High number of idle resources 11
  • 12. IBM SparkTechnology Center Analytics Platform – Current state of the art Open Source Jupyter based Notebook Platform • Single User sharing the same distributed filesystem and privileges • Jupyter Kernels running as local process • Resources are limited by what is available on the one single node that runs all Kernels and associated Spark drivers. • No security, users can see and control each others process using Jupyter’s administration utilities. 12
  • 13. IBM SparkTechnology Center Analytics Platform Today – Shared Cluster Allows Jupyter notebooks running outside of the cluster to run Jupyter kernels inside the cluster sharing it’s resources. • All Jupyter kernels run under a shared, “service” user ID. • Users can see and control each others’ kernels using Jupyter’s administration utilities. • All kernels and their associated Spark drivers run on a single (configurable) node of the cluster. 13 Spark Cluster Bob’s Desktop Multiple Notebooks Alice’s Desktop Multiple Notebooks Jupyter Kernel Gateway (Sandboxed by service user privileges) Jupyter Kernel Gateway Jupyter Notebook Server (with NB2KG) Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Kernel [Spark Driver] (yarn-client mode as JNBG Service User) YARN Workers Security Layer Jupyter Notebook Server (with NB2KG) Kernel [Spark Driver] (yarn-client mode as JNBG Service User)
  • 14. IBM SparkTechnology Center Analytics Platform Today – Single User Cluster Allows Jupyter notebooks running outside of the cluster to run Jupyter kernels in a cluster created specially to the user. • Expensive as clusters are created for every individual user 14 Spark Cluster Bob’s Desktop Multiple Notebooks Jupyter Kernel Gateway (Sandboxed by service user privileges) Jupyter Kernel Gateway Jupyter Notebook Server (with NB2KG) Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Kernel [Spark Driver] (yarn-client mode as JNBG Service User) YARN Workers Spark Cluster Alice’s Desktop Multiple Notebooks Jupyter Kernel Gateway (Sandboxed by service user privileges) Jupyter Kernel Gateway Executors (as Alice)Executors (as Alice)Spark Executors (as JNBG Service User) Kernel [Spark Driver] (yarn-client mode as JNBG Service User) YARN Workers Jupyter Notebook Server (with NB2KG)
  • 15. IBM SparkTechnology Center Extended Jupyter Kernel Gateway Notebook Platform based on Jupyter stack aiming on Enterprise/Cloud requirements and use cases 15
  • 16. IBM SparkTechnology Center Extended Jupyter Kernel Gateway – Goals Optimized Resource Allocation •Run Spark in YARN Cluster Mode to better utilize cluster resources. •Pluggable architecture for additional Resource Managers Enhanced Security •Enable TLS for all socket communications •Any HTTP communication should be encrypted (SSL) Multiuser support with user impersonation •Enhance security and sandboxing by enabling user impersonation when running kernels. •Individual HDFS home folder for each notebook user. •Use the same user ID for notebook and batch jobs. 16
  • 17. IBM SparkTechnology Center Extended Jupyter Kernel Gateway Extending Jupyter Kernel Gateway • Enable running kernels remotely in a cluster • Pluggable kernel lifecycle management • Enhanced security • Multiuser leveraging user impersonation 17 Extended Jupyter Kernel Gateway Jupyter Kernel Gateway Jupyter Notebook Server
  • 18. IBM SparkTechnology Center Spark Cluster Extended Jupyter Kernel Gateway 18 Security Layer Alice’s Desktop Multiple Notebooks Jupyter Notebook Server (with NB2KG) YARN Workers Extended Jupyter Kernel Gateway Jupyter REST API User Session Manager Kernel Lyfecycle Kernel Communication (local/remote) Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Bob’s Desktop Multiple Notebooks Jupyter Notebook Server (with NB2KG) Impersonation: Alice’s kernel runs under Alice’s user ID.
  • 19. IBM SparkTechnology Center Extended Jupyter Kernel Gateway Stay tuned, we are becoming open source very soon!!! Are you considering being an early adopter, please contact me at lresende AT us DOT ibm DOT com !!! 19
  • 20. IBM SparkTechnology Center Other Resources IBM developerWorks Code IBM developerWorks Journeys https://developer.ibm.com/code/ https://developer.ibm.com/code/journey/ 20