Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter

1
Zeppelin Meetup
Moonsoo Lee / Creator of Zeppelin
moon@zepl.com
@apachezeppelin

2
Agenda
⬢ Demo: Real-time Streaming
⬢ Demo: Zeppelin on Kubernetes
⬢ Zeppelin Roadmap
⬢ Q&A

6
Zeppelin server
nginx
DNS
resolver
Pod
Kubernetes
ApiServer
Pod
Python
Interpreter
python-intp
rpc 12321
Pod
Spark
Interpreter
spark-intp
rpc 12321
spark-driver 22321
spark-block
manager
22322
spark-ui 4040
Service
Spark
exec
Spark
execzeppelin-server
http 80
rpc 12320
Create interpreter pod Create spark executor pod
Ingress
Service
Service

7
Benefits
MULTI-TENANCY
Each note and/or user has own
container for interpreters
SCALABILITY
Single host does not run all
interpreters anymore
SECURITY
Each container is isolated
(filesystem, process etc.)

8
Usage
$ kubectl apply -f ${ZEPPELIN_HOME}/k8s/zeppelin-server.yaml
* Need to build your own Zeppelin and Spark docker image before 0.9.0 is released
1. Build Zeppelin distribution package mvn package -Pbuild-distr …
2. Build Zeppelin docker image cd scripts/docker/zeppelin/bin; docker build -t …
3. Build Spark docker image <spark-distribution>/bin/docker-image-tool.sh -m -t 2.4.0 build
Available in 0.9.0-SNAPSHOT
http://zeppelin.apache.org/docs/0.9.0-SNAPSHOT/quickstart/kubernetes.html
Run

9
Zeppelin Roadmap
- Zeppelin on Kubernetes
- Apply network policy to isolate Interpreter Pod
- Schedule note on background as a Job in Kubernetes
- Run extra application such as terminal, tensorboard, the sameway SparkUI works
- Modernize front-end stack
- Currently AngularJS
- Dark theme?
- Visualization
- Realtime data visualization
- Pivot in the backend side, instead of doing it in a front-end that require transfer all data to front-end
- Sidebar
- Sidebar with widgets, such as ToC (Table of Contents, list of data, etc)
- Online widget registry (Helium)
- Collaboration
- Multi-cursor edit
- Comment!

10
Zeppelin Roadmap
Modernize
front-end stack
• Currently AngularJS
• Dark theme
Zeppelin on
Kubernetes
• Apply network policy to isolate
Interpreter Pod
• Schedule note on background as a
Job in Kubernetes
• Run extra application such as
terminal, tensorboard, the sameway
SparkUI works
Collaboration
• Multi-cursor edit
• Comment!
Sidebar
• Sidebar with widgets, such as ToC
(Table of Contents, list of data, etc)
• Online widget registry (Helium)
Visualization
• Realtime data visualization
• Pivot in the backend side,
instead of doing it in a front-end
that require transfer all data to
front-end

11
Mailing list
- Users: users@zeppelin.apache.org
- Dev: dev@zeppelin.apache.org
JIRA
- https://issues.apache.org/jira/projects/ZEPPELIN
Github
- https://github.com/apache/zeppelin
Questions,
Suggestions,
Discussions, Votes!
Bug report, Track
development/release
progress
Fixes, improvements,
new features
Join Apache Zeppelin community.

12
www.zepl.com
Q&A
https://zeppelin.apache.org/
Moonsoo Lee / Creator of Zeppelin
moon@zepl.com
@issuefreaks
Send Mei Long your email for Apache Zeppelin
Slack invite: mlong@zepl.com
@meitrappist1
@ApacheZeppelin

15
Transformation on browser (current)
Zeppelin Server
{
title: ….
text: “select job, count(1) from data”,
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
http
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Transform (pivot)
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
Browser
job count
Student 2
Engineer 3
Teacher 1
Render

16
Problem
- Entire result dataset need to be transferred to browser, even though not all of
them are rendered.
- Browser CPU, memory is limitation of transforming / rendering data

17
Transformation on Server Zeppelin Server
{
title: ….
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
Note update
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Engineer
33 12019 Teacher
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
Transform (pivot)
job count
Student 2
Engineer 3
Teacher 1
job count
Student 2
Engineer 3
Teacher 1
Transform request (pivot)
Result dataset fetch

18
Transformation on Interpreter Zeppelin Server
{
title: ….
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
Transform (pivot)
job count
Student 2
Engineer 3
Teacher 1
job count
Student 2
Engineer 3
Teacher 1
Transform request
(pivot)
job count
Student 2
Engineer 3
Teacher 1
Note update

19
Transformation on where data is Zeppelin Server
{
title: ….
paragraphs: [
{
results: {
code: SUCCESS,
msg: [
type: TABLE,
data:
thrift
age balance job
21 1030 Student
34 20331 Engineer
50 30193 Marketing
33 12019 Engineer
23 23211 Engineer
29 92327 Student
... ... ...
Interpreter
Browser
job count
Student 2
Engineer 3
Teacher 1
Render
Transform
pushdown
job count
Student 2
Engineer 3
Teacher 1
job count
Student 2
Engineer 3
Teacher 1
Transform request
(pivot)
job count
Student 2
Engineer 3
Teacher 1
Note update

20
Related work
- Streaming data update (without refresh notebook)
- Separate transfer for result dataset and note to browser
- Partial data fetch for table display
- Extending TableData API

23
Sidebar widget #1
Sidebar widget #2
Group1 Group2 <
Sidebar hide button
Sidebar widgets
Sidebar widget can
be grouped

24
Contents
1. This is notebook
a. First
b. Second
2. Next
a. Next
One of the most popular feature in Jupyter.
Google Colab also supports it.
Zeppelin has SPELL
See https://www.npmjs.com/package/zeppelin-toc-spell
TOC (table of contents) widget

25
Displays list of table, schema of table, preview of data
recognized by Interpreter
Table data widget
Name Temporary
table1 no
bank yes
Tables
Column Type
age INT
job TEXT
Schema
Preview

26
Drag and drop paragraph to the clipboard.
In the same or in another notebook and drag and drop
paragraph from clipboard.
Clipboard
Drop paragraph here
Paragraph a
Paragraph b

28
Thank you!
Please contact Mei Long mlong@zepl.com with your email
address for an invite to Apache Zeppelin Slack workspace

Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter

Similar to Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter (20)

Recently uploaded

Recently uploaded (20)

Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter