Apache Zeppelin Helium and Beyond

Apache Zeppelin
Moon soo Lee
moon@nﬂabs.com
He lium And BeyondHe
2

Apache Zeppelin (incubating)
http://zeppelin.incubator.apache.org

Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark

?
Cloudera-ML
ML-base
MRQL
Shark
?
Zeppelin
We thought there’re missing piece in Hadoop landscape. An analytics env.

Zeppelin
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2013. 10 2014. 08

Zeppelin
2014. 12 ASF incubation
Incubation Status http://incubator.apache.org/projects/zeppelin.html

Zeppelin
2014. 12 ASF incubation
2016. 03 110 Contributors world wide
1355 Stars on github repo
3 Releases
One of the most popular project in ASF

Zeppelin
A web-based notebook that enables interactive data analytics. You can
make beautiful data-driven, interactive and collaborative documents with
SQL, Scala and more.

Zeppelin
JDBC
Markdown > _ Shell
Interpreter : pluggable layer for language / processing backend integration
20+ interpreters are supported ofﬁcially
2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters

Zeppelin
Interpreter : pluggable layer for language / processing backend integration

Zeppelin
Interpreter : Easy to extend
public abstract class Interpreter {
public void open();
public void close();
public InterpreterResult interpret(String st, InterpreterContext context);
public void cancel(InterpreterContext context);
public int getProgress(InterpreterContext context);
public List<String> completion(String buf, int cursor);
public FormType getFormType();
public Scheduler getScheduler();
}
{Must have
{Good to have
Advanced {

Zeppelin
Notebook Repo : pluggable layer for notebook persistence
4+ Notebook repos are supported ofﬁcially
2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters

Zeppelin
Notebook Repo : Easy to extend
public interface NotebookRepo {
public List<NoteInfo> list() throws IOException;
public Note get(String noteId) throws IOException;
public void save(Note note) throws IOException;
public void remove(String noteId) throws IOException;
public void checkpoint(String noteId, String checkPointName) throws IOException;
public void close();
}

Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…

He liumHe
2
Platform for data analytics application that
makes visualization pluggable and more.
http://issues.apache.org/jira/browse/ZEPPELIN-533
https://cwiki.apache.org/conﬂuence/display/ZEPPELIN/Helium+proposal
https://github.com/apache/incubator-zeppelin/pull/836
Proposal
Umbrella issue
Pull request
Makes Zeppelin ﬂy!

He liumHe
2
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Interpreters and Notebook storage are pluggable

He liumHe
2
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Visualizations
Map
WordCloud
…
We want visualization be pluggable

He liumHe
2
Spark
Flink
Geode
JDBC …
FileSystem
AmazonS3
Git
…
Application
Visualizations
Map
WordCloud
…
Resource Pool
SparkContext Flink Environment JDBC connection …
Analytics
ML
…
User object
Extend pluggable visualization to pluggable analytics application

He liumHe
2
Helium application is interaction between view, algorithm and resources
= +
View Algorithm
Zeppelin provided Resources
Application

He liumHe
2
Zeppelin Server
Web browser
View
Interpreter Process
Algorithm
Resource pool
Resource pool
Resource
pools are
connected
Helium application runs where resource exists

Helium Application: Easy to extend
public abstract class Application {
public Application(ApplicationContext context);
public abstract void run(ResourceSet args);
public abstract void unload();
}
He liumHe
2

He liumHe
2
Application
Resource Pool
SparkContext Flink Environment JDBC connection …User object
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
Map
WordCloud
…Maven
Download and load on the ﬂy
Online repository for pluggable modules

He liumHe
2
Helium
Registry zeppelin-packages my company + Add
XX
VisualizationWordcloud
Make your table output to word cloud
Install
R Interpreter
R is a free software environment for statistical computing and graphics. It compiles and
runs on a wide variety of UNIX platforms, Windows and MacOS
Install
ZeppelinHub Notebook Storage
Save your notebook in ZeppelinHub.
You can access control and share your notebook online
Install
Registry for pluggable modules

He liumHe
2
Conclusion
Helium trying to bring Zeppelin
from notebook to analytics application platform.
You can build and distribute not only your visualizations but also
analytics application that uses cluster resources provided by Zeppelin
interpreters.
Next challenge is enrich the Helium registry.

Zeppelin
Enterprise Ready
Multi-tenancy
Job scheduler
HA 
Usability Improvement
UX improvement
Table data support 
Add yours!
More details on https://cwiki.apache.org/conﬂuence/display/ZEPPELIN/Zeppelin+Roadmap
Roadmap

z
Zeppelin and Friends
Z-Manager
ZeppelinHub
Collaboration/Sharing
Packaging & Deployment
Zeppelin + Full stack on a cloud
We want to grow community around Zeppelin, more!

Zeppelin
Homepage
http://zeppelin.incubator.apache.org/ 
Mailing list
users@zeppelin.incubator.apache.org
dev@zeppelin.incubator.apache.org
Issue tracker
https://issues.apache.org/jira/browse/ZEPPELIN
Github repository
http://github.com/apache/incubator-zeppelin
Join the community

Thank you
Moon soo Lee
moon@nﬂabs.com
moon@apache.org
https://twitter.com/issuefreaks

Apache Zeppelin Helium and Beyond

More Related Content

What's hot

Similar to Apache Zeppelin Helium and Beyond

More from DataWorks Summit/Hadoop Summit

Recently uploaded

Apache Zeppelin Helium and Beyond