SlideShare a Scribd company logo
1 of 30
Download to read offline
Building analytical micro services
powered by Jupyter Kernels
About me - Luciano Resende
Data Science Platform Architect – IBM – CODAIT (formerly Spark Technology Center)
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Spark,
Apache Toree among other projects related to Apache Spark ecosystem
lresende@apache.org
http://lresende.blogspot.com/
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
@
About me – Kevin Bates
Sr. Software Engineer – IBM – CODAIT (formerly Spark Technology Center)
• Over 30 years developing enterprise-level software
• Currently working in the Jupyter ecosystem the last 14 months
kbates4@gmail.com
https://www.linkedin.com/in/kevinbatessoftware
@kbates4
https://github.com/kevin-bates
@
Jupyter Notebooks
Notebooks are interactive computational
environments, in which you can
combine code execution, rich text,
mathematics, plots and rich media.
Jupyter Notebook Platform Architecture
• Notebook UI runs on the browser
• The Notebook Server serves the ’Notebooks’
• Kernels interpret/execute cell contents
§ Are responsible for code execution
§ Abstracts different languages
Jupyter Notebooks Architecture
Notebook Server Process
JavaScript
NotebookManagement
Python Process
KernelManagement
IPythonKernel
KernelProxy
Shell
IOPub
stdin
control
heartbeat
UserCode
sklearn
Spark
Tensor
Flow
…
Jupyter Connection Profile
{
"stdin_port": 37934,
"control_port": 42264,
"hb_port": 34727,
"shell_port": 35502,
"iopub_port": 59585,
"transport": "tcp",
"ip": "127.0.0.1",
"signature_scheme": "hmac-sha256",
"key": "4b306a48-98a0715362cbc47aafbc4e5f",
"kernel_name": ""
}
Jupyter Messaging Protocol
Available Sockets:
• Shell (requests, history, info)
• IOPub (status, display, results)
• Stdin (input requests from kernel)
• Control (shutdown, interrupt)
• Heartbeat (poll)
Message flow
Two types of responses
• Results
• Computations that return a result
• 1+1
• val a = 2 + 5
• Stream Content
• Values that are written to output stream
• println(‘Hello World’)
• df.show(10)
Client Program Kernel
Evaluate (msgid=1) ‘1+1’
Busy (msgid=1)
Status (msgid=1) ok/error
Result (msgid=1)
Stream Content (msgid=1)
Idle (msgid=1)
Introducing
Jupyter Enterprise Gateway
Jupyter Notebooks Architecture
Notebook Server Process
JavaScript
NotebookManagement
Python Process
KernelManagement
IPythonKernel
KernelProxy
Shell
IOPub
stdin
control
heartbeat
UserCode
sklearn
Spark
Tensor
Flow
…
The Origins of Jupyter Enterprise Gateway​
• Multiple IBM products embedding Spark on YARN​
• All wanted to add Jupyter notebooks with Spark​
• Usual enterprise requirements (multitenancy, scalability, security, etc.)​
• Attempts at scaling up (one large server) or having a single Notebook server
per user were insufficient
• Jupyter Kernel Gateway introduced a Bring Your Own Notebook model via
Websocket “Personality” and Notebook extension – nb2kg
Initial prototype using Jupyter Kernel Gateway
YARN Cluster
Security
Layer
YARN
Workers
YARN
Resource
Manager
Spark
ExecutorsSpark
ExecutorsSpark
Executors
Spark
ExecutorsSpark
ExecutorsSpark
Executors
Gateway Node
nb2kg
(Proxy)
nb2kg
Jupyter
Kernel
Gateway
Python
Kernel
Spark Driver
Python
Kernel
Spark Driver
Shell
IOPub
stdin
control
heartbeat
Issue #2: All Spark jobs
run as same user ID
Issue #1: All kernels
and Spark drivers run
on a single node
Issue: All kernels run on a single node
8 8 8 8
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
MAXKERNELS(4GBHEAP)
CLUSTER SIZE (32GB NODES)
Maximum Number of Simultaneous Kernels
Jupyter Enterprise Gateway: Initial Goals
Optimized Resource Allocation
§ Run Spark in YARN Cluster Mode to better utilize cluster resources
§ Pluggable architecture for additional Resource Managers and Lifecycle Management
§ General framework for remote kernels
Multiuser support with user impersonation
§ Enhance security and sandboxing by enabling user impersonation when running kernels (using
Kerberos)
§ Individual HDFS home folder for each notebook user
§ Enables use of same user ID for both notebook and batch jobs
Enhanced Security
§ Secure socket communications
§ Any network communication should be encrypted
Jupyter Enterprise Gateway
Jupyter Kernel Gateway
Jupyter Notebook
jupyter_client
Jupyter Enterprise Gateway
YARN Cluster
YARN
Workers
Gateway Node
Jupyter Enterprise Gateway
• Multitenancy
• Remote kernel lifecycle management via process proxies
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Impersonation:
Alice’s kernel
runs under
Alice’s user ID.
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Security
Layer
nb2kg
(Proxy)
nb2kg
Scalability Benefits
8 8 8 8
16
32
48
64
0
10
20
30
40
50
60
70
80
4 Nodes 8 Nodes 12 Nodes 16 Nodes
MAXKERNELS(4GBHEAP)
CLUSTER SIZE (32GB NODES)
Maximum Number of Simultaneous Kernels
Before JEG
After JEG
Jupyter Notebooks with Enterprise Gateway
Notebook Server Process
JavaScript
NotebookManagement
KernelManagement
(viaNB2KG)
Kernel Launcher Process
IPythonKernel(embedded)
Shell
IOPub
stdin
control
heartbeat
UserCode
sklearn
Spark
Tensor
Flow
…
Enterprise Gateway
Process
Kernel
Lifecycle
Management
Process
Proxies
Websocket
passthrough
via
Kernel
Gateway
Building an Analytical Micro Service
Use Case – Sentiment Analysis
Utilizing Yelp Dataset from Kaggle
Utilizing AFINN sentiment analysis library in Python
Using PySpark for training and scoring model
Using Jupyter Kernels to integrate between micro service and spark
Yelp Dataset: https://www.kaggle.com/yelp-dataset/yelp-dataset
Use Case – Sentiment Analysis
Integrates analytics to your regular
python application/micro service by
leveraging Jupyter Notebook
Kernels
YARN Cluster
YARN
Workers
Enterprise Gateway Node
Jupyter Enterprise Gateway
Multitenancy
Remote kernels and Kernel Lifecycle management
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Spark Executors
Spark Executors
Spark Executors
Yarn Container
Jupyter Kernel
Spark Driver
Sentiment
Resource
flask
Kernel
Laucher/Client
http
Sentiment
Provider
Use Case – Sentiment Analysis
Loading the Yelp Dataset
business = spark.read
.option('header','true')
.option('inferSchema', 'true')
.csv('file:///opt/data/yelp_business.csv')
city_business = business.filter(business.city == 'Brooklyn')
city_business.write.parquet(path='yelp/business', mode='overwrite')
reviews = spark.read
.option('header', 'true')
.option('inferSchema', 'true')
.csv('file:///opt/data/yelp_review.csv')
city_business_reviews = reviews.join(city_business, city_business.business_id ==
reviews.business_id, 'left_semi')
city_business_reviews.write.parquet(path='yelp/reviews', mode='overwrite')
Application – Sentiment REST API
• Leverage Flask-RESTFul
• Exposes a sentiment REST API
• Request sentiment for a given business
• http://<host>:5000/sentiment/<business_id>
• During Application startup
• Start the kernel
• Perform required data load operations
Application – Sentiment Provider
• Encapsulate all interactions with the
kernel
• Start/Stop the Kernel
• Load necessary tables
• Retrieve business details
• Calculate sentiment for each business review
• Possible enhancements/todos
• Return data as Numpy Arrays
• Provide more flexibility to manipulate
display on the resource side (e.g pretty html)
Application – Kernel Launcher
• Encapsulate kernel lifecycle
• Start/Stop the Kernel
• Instantiate a new kernel object
Application – Kernel Object
• Encapsulate kernel code execution
• Create proper code execution requests
• Monitor kernel responses
• Errors
• Result Responses
• Stream Responses
• Kernel idle status
• Return code execution responses
Application – Live Demo
Further Readings
Connect direct to kernel
• See Jupyter Client https://github.com/jupyter/jupyter_client
• See https://github.com/lresende/toree-gateway/blob/master/python/toree_client.py
Jupyter Kernel Gateway/Enterprise Gateway HTTP Personality
• Starts the Gateway in single notebook mode
• Notebook cells (based on resource identifier comment) become accessible via URLs
# GET /hello/world
print("I'm cell #1")
• http://jupyter-kernel-gateway.readthedocs.io/en/latest/http-mode.html
Resources
Jupyter Enterprise Gateway source code at GitHub
https://github.com/jupyter-incubator/enterprise_gateway
Jupyter Enterprise Gateway Documentation
http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Microservice Demo Application
https://github.com/lresende/eg-sentiment-microservice-demo

More Related Content

What's hot

Build FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWERBuild FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWERIndrajit Poddar
 
Kokki: Configuration Management Framework
Kokki: Configuration Management FrameworkKokki: Configuration Management Framework
Kokki: Configuration Management FrameworkAleksey Maksimov
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven ! Animesh Singh
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...Animesh Singh
 
Redfish & python redfish
Redfish & python redfishRedfish & python redfish
Redfish & python redfishRené Ribaud
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwarePlatform9
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Herding your cattle from dev to ops
Herding your cattle from dev to opsHerding your cattle from dev to ops
Herding your cattle from dev to opsBastiaan Schaap
 
Finding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User GroupFinding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User GroupDaniel Krook
 
Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsAnimesh Singh
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessArun Gupta
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
HP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentHP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentMarton Kiss
 
Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1inside-BigData.com
 
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteDani Traphagen
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn Yahoo Developer Network
 
Cloud foundry integration-with-openstack-and-docker-bangalorecf-meetup
Cloud foundry integration-with-openstack-and-docker-bangalorecf-meetupCloud foundry integration-with-openstack-and-docker-bangalorecf-meetup
Cloud foundry integration-with-openstack-and-docker-bangalorecf-meetupKrishna-Kumar
 

What's hot (19)

Build FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWERBuild FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWER
 
Kokki: Configuration Management Framework
Kokki: Configuration Management FrameworkKokki: Configuration Management Framework
Kokki: Configuration Management Framework
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !
 
How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...How to build an event-driven, polyglot serverless microservices framework on ...
How to build an event-driven, polyglot serverless microservices framework on ...
 
Redfish & python redfish
Redfish & python redfishRedfish & python redfish
Redfish & python redfish
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMware
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Herding your cattle from dev to ops
Herding your cattle from dev to opsHerding your cattle from dev to ops
Herding your cattle from dev to ops
 
Finding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User GroupFinding and Organizing a Great Cloud Foundry User Group
Finding and Organizing a Great Cloud Foundry User Group
 
Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deployments
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
HP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition DeploymentHP Helion OpenStack Community Edition Deployment
HP Helion OpenStack Community Edition Deployment
 
Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1
 
Nike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-igniteNike tech-talk-intro-to-apache-ignite
Nike tech-talk-intro-to-apache-ignite
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
 
Cloud foundry integration-with-openstack-and-docker-bangalorecf-meetup
Cloud foundry integration-with-openstack-and-docker-bangalorecf-meetupCloud foundry integration-with-openstack-and-docker-bangalorecf-meetup
Cloud foundry integration-with-openstack-and-docker-bangalorecf-meetup
 

Similar to Building analytical microservices with Jupyter Kernels

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesDatabricks
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Mike Broberg
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on DockerLynn Langit
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introductionakira-ai
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 

Similar to Building analytical microservices with Jupyter Kernels (20)

The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
 
Blastn plus jupyter on Docker
Blastn plus jupyter on DockerBlastn plus jupyter on Docker
Blastn plus jupyter on Docker
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 

More from Luciano Resende

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirLuciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirLuciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overviewLuciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitionsLuciano Resende
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceLuciano Resende
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSLuciano Resende
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscanyLuciano Resende
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyLuciano Resende
 

More from Luciano Resende (20)

Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
IoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache BahirIoT Applications and Patterns using Apache Spark & Apache Bahir
IoT Applications and Patterns using Apache Spark & Apache Bahir
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache BahirGetting insights from IoT data with Apache Spark and Apache Bahir
Getting insights from IoT data with Apache Spark and Apache Bahir
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 
How mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open sourceHow mentoring programs can help newcomers get started with open source
How mentoring programs can help newcomers get started with open source
 
Building RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RSBuilding RESTful services using SCA and JAX-RS
Building RESTful services using SCA and JAX-RS
 
SCA Reaches the Cloud
SCA Reaches the CloudSCA Reaches the Cloud
SCA Reaches the Cloud
 
Building apps with tuscany
Building apps with tuscanyBuilding apps with tuscany
Building apps with tuscany
 
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache TuscanyS314011 - Developing Composite Applications for the Cloud with Apache Tuscany
S314011 - Developing Composite Applications for the Cloud with Apache Tuscany
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Building analytical microservices with Jupyter Kernels

  • 1.
  • 2. Building analytical micro services powered by Jupyter Kernels
  • 3. About me - Luciano Resende Data Science Platform Architect – IBM – CODAIT (formerly Spark Technology Center) • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Spark, Apache Toree among other projects related to Apache Spark ecosystem lresende@apache.org http://lresende.blogspot.com/ https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende @
  • 4. About me – Kevin Bates Sr. Software Engineer – IBM – CODAIT (formerly Spark Technology Center) • Over 30 years developing enterprise-level software • Currently working in the Jupyter ecosystem the last 14 months kbates4@gmail.com https://www.linkedin.com/in/kevinbatessoftware @kbates4 https://github.com/kevin-bates @
  • 5. Jupyter Notebooks Notebooks are interactive computational environments, in which you can combine code execution, rich text, mathematics, plots and rich media.
  • 6. Jupyter Notebook Platform Architecture • Notebook UI runs on the browser • The Notebook Server serves the ’Notebooks’ • Kernels interpret/execute cell contents § Are responsible for code execution § Abstracts different languages
  • 7. Jupyter Notebooks Architecture Notebook Server Process JavaScript NotebookManagement Python Process KernelManagement IPythonKernel KernelProxy Shell IOPub stdin control heartbeat UserCode sklearn Spark Tensor Flow …
  • 8. Jupyter Connection Profile { "stdin_port": 37934, "control_port": 42264, "hb_port": 34727, "shell_port": 35502, "iopub_port": 59585, "transport": "tcp", "ip": "127.0.0.1", "signature_scheme": "hmac-sha256", "key": "4b306a48-98a0715362cbc47aafbc4e5f", "kernel_name": "" }
  • 9. Jupyter Messaging Protocol Available Sockets: • Shell (requests, history, info) • IOPub (status, display, results) • Stdin (input requests from kernel) • Control (shutdown, interrupt) • Heartbeat (poll)
  • 10. Message flow Two types of responses • Results • Computations that return a result • 1+1 • val a = 2 + 5 • Stream Content • Values that are written to output stream • println(‘Hello World’) • df.show(10) Client Program Kernel Evaluate (msgid=1) ‘1+1’ Busy (msgid=1) Status (msgid=1) ok/error Result (msgid=1) Stream Content (msgid=1) Idle (msgid=1)
  • 12. Jupyter Notebooks Architecture Notebook Server Process JavaScript NotebookManagement Python Process KernelManagement IPythonKernel KernelProxy Shell IOPub stdin control heartbeat UserCode sklearn Spark Tensor Flow …
  • 13. The Origins of Jupyter Enterprise Gateway​ • Multiple IBM products embedding Spark on YARN​ • All wanted to add Jupyter notebooks with Spark​ • Usual enterprise requirements (multitenancy, scalability, security, etc.)​ • Attempts at scaling up (one large server) or having a single Notebook server per user were insufficient • Jupyter Kernel Gateway introduced a Bring Your Own Notebook model via Websocket “Personality” and Notebook extension – nb2kg
  • 14. Initial prototype using Jupyter Kernel Gateway YARN Cluster Security Layer YARN Workers YARN Resource Manager Spark ExecutorsSpark ExecutorsSpark Executors Spark ExecutorsSpark ExecutorsSpark Executors Gateway Node nb2kg (Proxy) nb2kg Jupyter Kernel Gateway Python Kernel Spark Driver Python Kernel Spark Driver Shell IOPub stdin control heartbeat Issue #2: All Spark jobs run as same user ID Issue #1: All kernels and Spark drivers run on a single node
  • 15. Issue: All kernels run on a single node 8 8 8 8 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes MAXKERNELS(4GBHEAP) CLUSTER SIZE (32GB NODES) Maximum Number of Simultaneous Kernels
  • 16. Jupyter Enterprise Gateway: Initial Goals Optimized Resource Allocation § Run Spark in YARN Cluster Mode to better utilize cluster resources § Pluggable architecture for additional Resource Managers and Lifecycle Management § General framework for remote kernels Multiuser support with user impersonation § Enhance security and sandboxing by enabling user impersonation when running kernels (using Kerberos) § Individual HDFS home folder for each notebook user § Enables use of same user ID for both notebook and batch jobs Enhanced Security § Secure socket communications § Any network communication should be encrypted Jupyter Enterprise Gateway Jupyter Kernel Gateway Jupyter Notebook jupyter_client
  • 17. Jupyter Enterprise Gateway YARN Cluster YARN Workers Gateway Node Jupyter Enterprise Gateway • Multitenancy • Remote kernel lifecycle management via process proxies Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Impersonation: Alice’s kernel runs under Alice’s user ID. Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Security Layer nb2kg (Proxy) nb2kg
  • 18. Scalability Benefits 8 8 8 8 16 32 48 64 0 10 20 30 40 50 60 70 80 4 Nodes 8 Nodes 12 Nodes 16 Nodes MAXKERNELS(4GBHEAP) CLUSTER SIZE (32GB NODES) Maximum Number of Simultaneous Kernels Before JEG After JEG
  • 19. Jupyter Notebooks with Enterprise Gateway Notebook Server Process JavaScript NotebookManagement KernelManagement (viaNB2KG) Kernel Launcher Process IPythonKernel(embedded) Shell IOPub stdin control heartbeat UserCode sklearn Spark Tensor Flow … Enterprise Gateway Process Kernel Lifecycle Management Process Proxies Websocket passthrough via Kernel Gateway
  • 20. Building an Analytical Micro Service
  • 21. Use Case – Sentiment Analysis Utilizing Yelp Dataset from Kaggle Utilizing AFINN sentiment analysis library in Python Using PySpark for training and scoring model Using Jupyter Kernels to integrate between micro service and spark Yelp Dataset: https://www.kaggle.com/yelp-dataset/yelp-dataset
  • 22. Use Case – Sentiment Analysis Integrates analytics to your regular python application/micro service by leveraging Jupyter Notebook Kernels YARN Cluster YARN Workers Enterprise Gateway Node Jupyter Enterprise Gateway Multitenancy Remote kernels and Kernel Lifecycle management Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Spark Executors Spark Executors Spark Executors Yarn Container Jupyter Kernel Spark Driver Sentiment Resource flask Kernel Laucher/Client http Sentiment Provider
  • 23. Use Case – Sentiment Analysis Loading the Yelp Dataset business = spark.read .option('header','true') .option('inferSchema', 'true') .csv('file:///opt/data/yelp_business.csv') city_business = business.filter(business.city == 'Brooklyn') city_business.write.parquet(path='yelp/business', mode='overwrite') reviews = spark.read .option('header', 'true') .option('inferSchema', 'true') .csv('file:///opt/data/yelp_review.csv') city_business_reviews = reviews.join(city_business, city_business.business_id == reviews.business_id, 'left_semi') city_business_reviews.write.parquet(path='yelp/reviews', mode='overwrite')
  • 24. Application – Sentiment REST API • Leverage Flask-RESTFul • Exposes a sentiment REST API • Request sentiment for a given business • http://<host>:5000/sentiment/<business_id> • During Application startup • Start the kernel • Perform required data load operations
  • 25. Application – Sentiment Provider • Encapsulate all interactions with the kernel • Start/Stop the Kernel • Load necessary tables • Retrieve business details • Calculate sentiment for each business review • Possible enhancements/todos • Return data as Numpy Arrays • Provide more flexibility to manipulate display on the resource side (e.g pretty html)
  • 26. Application – Kernel Launcher • Encapsulate kernel lifecycle • Start/Stop the Kernel • Instantiate a new kernel object
  • 27. Application – Kernel Object • Encapsulate kernel code execution • Create proper code execution requests • Monitor kernel responses • Errors • Result Responses • Stream Responses • Kernel idle status • Return code execution responses
  • 29. Further Readings Connect direct to kernel • See Jupyter Client https://github.com/jupyter/jupyter_client • See https://github.com/lresende/toree-gateway/blob/master/python/toree_client.py Jupyter Kernel Gateway/Enterprise Gateway HTTP Personality • Starts the Gateway in single notebook mode • Notebook cells (based on resource identifier comment) become accessible via URLs # GET /hello/world print("I'm cell #1") • http://jupyter-kernel-gateway.readthedocs.io/en/latest/http-mode.html
  • 30. Resources Jupyter Enterprise Gateway source code at GitHub https://github.com/jupyter-incubator/enterprise_gateway Jupyter Enterprise Gateway Documentation http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ Microservice Demo Application https://github.com/lresende/eg-sentiment-microservice-demo