Apache Zeppelin
Moon soo Lee
moon@nflabs.com
He lium And BeyondHe
2
Apache Zeppelin (incubating)
http://zeppelin.incubator.apache.org
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
?
Cloudera-ML
ML-base
MRQL
Shark
?
Zeppelin
We thought there’re missing piece in Hadoop landscape. An analytics env.
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2013. 10 2014. 08
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
Incubation Status http://incubator.apache.org/projects/zeppelin.html
Zeppelin
2012. 12 Data analytics solution based on AMP Lab Spark/Shark
2013. 10 Opensource interactive analytics feature as ‘Zeppelin’
2014. 12 ASF incubation
2016. 03 110 Contributors world wide
1355 Stars on github repo
3 Releases
One of the most popular project in ASF
Zeppelin
A web-based notebook that enables interactive data analytics. You can
make beautiful data-driven, interactive and collaborative documents with
SQL, Scala and more.
Let’s take a look
Zeppelin
JDBC
Markdown > _ Shell
Interpreter : pluggable layer for language / processing backend integration
20+ interpreters are supported officially
2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
Zeppelin
Interpreter : pluggable layer for language / processing backend integration
Zeppelin
Interpreter : Easy to extend
public abstract class Interpreter {
public void open();
public void close();
public InterpreterResult interpret(String st, InterpreterContext context);
public void cancel(InterpreterContext context);
public int getProgress(InterpreterContext context);
public List<String> completion(String buf, int cursor);
public FormType getFormType();
public Scheduler getScheduler();
}
{Must have
{Good to have
Advanced {
Zeppelin
Notebook Repo : pluggable layer for notebook persistence
4+ Notebook repos are supported officially
2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters
Zeppelin
Notebook Repo : Easy to extend
public interface NotebookRepo {
public List<NoteInfo> list() throws IOException;
public Note get(String noteId) throws IOException;
public void save(Note note) throws IOException;
public void remove(String noteId) throws IOException;
public void checkpoint(String noteId, String checkPointName) throws IOException;
public void close();
}
Zeppelin
Visualizations : 6 Built-in visualizations comes with pivot
Table Bar Pie Area Line Scatter
Free to draw any customized visualizations inside of notebook
…
He liumHe
2
Platform for data analytics application that
makes visualization pluggable and more.
http://issues.apache.org/jira/browse/ZEPPELIN-533
https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal
https://github.com/apache/incubator-zeppelin/pull/836
Proposal
Umbrella issue
Pull request
Makes Zeppelin fly!
He liumHe
2
RESTful API Websocket
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Interpreters and Notebook storage are pluggable
He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
ZeppelinServer
Visualizations
Map
WordCloud
…
We want visualization be pluggable
He liumHe
2
Interpreter Notebook Storage
Spark
Flink
Geode
JDBC …
FileSystem
AmazonS3
Git
…
Application
Visualizations
Map
WordCloud
…
Resource Pool
SparkContext Flink Environment JDBC connection …
Analytics
ML
…
User object
Extend pluggable visualization to pluggable analytics application
He liumHe
2
Helium application is interaction between view, algorithm and resources
= +
View Algorithm
Zeppelin provided Resources
Application
He liumHe
2
Zeppelin Server
Web browser
View
Interpreter Process
Algorithm
Resource pool
Resource pool
Resource
pools are
connected
Helium application runs where resource exists
Helium Application: Easy to extend
public abstract class Application {
public Application(ApplicationContext context);
public abstract void run(ResourceSet args);
public abstract void unload();
}
He liumHe
2
He liumHe
2
Interpreter Notebook Storage
Application
Resource Pool
SparkContext Flink Environment JDBC connection …User object
Spark
Flink
Geode
JDBC
…
FileSystem
AmazonS3
Git
…
Map
WordCloud
…Maven
Download and load on the fly
Online repository for pluggable modules
He liumHe
2
Helium
Registry zeppelin-packages my company + Add
XX
VisualizationWordcloud
Make your table output to word cloud
Install
R Interpreter
R is a free software environment for statistical computing and graphics. It compiles and
runs on a wide variety of UNIX platforms, Windows and MacOS
Install
ZeppelinHub Notebook Storage
Save your notebook in ZeppelinHub.
You can access control and share your notebook online
Install
Registry for pluggable modules
He liumHe
2
Conclusion
Helium trying to bring Zeppelin
from notebook to analytics application platform.
You can build and distribute not only your visualizations but also
analytics application that uses cluster resources provided by Zeppelin
interpreters.
Next challenge is enrich the Helium registry.
Zeppelin
Enterprise Ready
Multi-tenancy
Job scheduler
HA

Usability Improvement
UX improvement
Table data support

Add yours!
More details on https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap
Roadmap
z
Zeppelin and Friends
Z-Manager
ZeppelinHub
Collaboration/Sharing
Packaging & Deployment
Zeppelin + Full stack on a cloud
We want to grow community around Zeppelin, more!
Zeppelin
Homepage
http://zeppelin.incubator.apache.org/

Mailing list
users@zeppelin.incubator.apache.org
dev@zeppelin.incubator.apache.org
Issue tracker
https://issues.apache.org/jira/browse/ZEPPELIN
Github repository
http://github.com/apache/incubator-zeppelin
Join the community
Thank you
Moon soo Lee
moon@nflabs.com
moon@apache.org
https://twitter.com/issuefreaks

Apache Zeppelin Helium and Beyond

  • 1.
    Apache Zeppelin Moon sooLee moon@nflabs.com He lium And BeyondHe 2
  • 2.
  • 3.
    Zeppelin 2012. 12 Dataanalytics solution based on AMP Lab Spark/Shark
  • 4.
    ? Cloudera-ML ML-base MRQL Shark ? Zeppelin We thought there’remissing piece in Hadoop landscape. An analytics env.
  • 5.
    Zeppelin 2012. 12 Dataanalytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2013. 10 2014. 08
  • 6.
    Zeppelin 2012. 12 Dataanalytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2014. 12 ASF incubation Incubation Status http://incubator.apache.org/projects/zeppelin.html
  • 7.
    Zeppelin 2012. 12 Dataanalytics solution based on AMP Lab Spark/Shark 2013. 10 Opensource interactive analytics feature as ‘Zeppelin’ 2014. 12 ASF incubation 2016. 03 110 Contributors world wide 1355 Stars on github repo 3 Releases One of the most popular project in ASF
  • 8.
    Zeppelin A web-based notebookthat enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
  • 9.
  • 10.
    Zeppelin JDBC Markdown > _Shell Interpreter : pluggable layer for language / processing backend integration 20+ interpreters are supported officially 2016. 03. Interpreters in Zeppelin source tree. Does not include 3rd party interpreters
  • 11.
    Zeppelin Interpreter : pluggablelayer for language / processing backend integration
  • 12.
    Zeppelin Interpreter : Easyto extend public abstract class Interpreter { public void open(); public void close(); public InterpreterResult interpret(String st, InterpreterContext context); public void cancel(InterpreterContext context); public int getProgress(InterpreterContext context); public List<String> completion(String buf, int cursor); public FormType getFormType(); public Scheduler getScheduler(); } {Must have {Good to have Advanced {
  • 13.
    Zeppelin Notebook Repo :pluggable layer for notebook persistence 4+ Notebook repos are supported officially 2016. 03. Notebook repos in Zeppelin source tree. Does not include 3rd party interpreters
  • 14.
    Zeppelin Notebook Repo :Easy to extend public interface NotebookRepo { public List<NoteInfo> list() throws IOException; public Note get(String noteId) throws IOException; public void save(Note note) throws IOException; public void remove(String noteId) throws IOException; public void checkpoint(String noteId, String checkPointName) throws IOException; public void close(); }
  • 15.
    Zeppelin Visualizations : 6Built-in visualizations comes with pivot Table Bar Pie Area Line Scatter Free to draw any customized visualizations inside of notebook …
  • 16.
    He liumHe 2 Platform fordata analytics application that makes visualization pluggable and more. http://issues.apache.org/jira/browse/ZEPPELIN-533 https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal https://github.com/apache/incubator-zeppelin/pull/836 Proposal Umbrella issue Pull request Makes Zeppelin fly!
  • 17.
    He liumHe 2 RESTful APIWebsocket Interpreter Notebook Storage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Interpreters and Notebook storage are pluggable
  • 18.
    He liumHe 2 Interpreter NotebookStorage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … ZeppelinServer Visualizations Map WordCloud … We want visualization be pluggable
  • 19.
    He liumHe 2 Interpreter NotebookStorage Spark Flink Geode JDBC … FileSystem AmazonS3 Git … Application Visualizations Map WordCloud … Resource Pool SparkContext Flink Environment JDBC connection … Analytics ML … User object Extend pluggable visualization to pluggable analytics application
  • 20.
    He liumHe 2 Helium applicationis interaction between view, algorithm and resources = + View Algorithm Zeppelin provided Resources Application
  • 21.
    He liumHe 2 Zeppelin Server Webbrowser View Interpreter Process Algorithm Resource pool Resource pool Resource pools are connected Helium application runs where resource exists
  • 22.
    Helium Application: Easyto extend public abstract class Application { public Application(ApplicationContext context); public abstract void run(ResourceSet args); public abstract void unload(); } He liumHe 2
  • 23.
    He liumHe 2 Interpreter NotebookStorage Application Resource Pool SparkContext Flink Environment JDBC connection …User object Spark Flink Geode JDBC … FileSystem AmazonS3 Git … Map WordCloud …Maven Download and load on the fly Online repository for pluggable modules
  • 24.
    He liumHe 2 Helium Registry zeppelin-packagesmy company + Add XX VisualizationWordcloud Make your table output to word cloud Install R Interpreter R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS Install ZeppelinHub Notebook Storage Save your notebook in ZeppelinHub. You can access control and share your notebook online Install Registry for pluggable modules
  • 25.
    He liumHe 2 Conclusion Helium tryingto bring Zeppelin from notebook to analytics application platform. You can build and distribute not only your visualizations but also analytics application that uses cluster resources provided by Zeppelin interpreters. Next challenge is enrich the Helium registry.
  • 26.
    Zeppelin Enterprise Ready Multi-tenancy Job scheduler HA
 UsabilityImprovement UX improvement Table data support
 Add yours! More details on https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap Roadmap
  • 27.
    z Zeppelin and Friends Z-Manager ZeppelinHub Collaboration/Sharing Packaging& Deployment Zeppelin + Full stack on a cloud We want to grow community around Zeppelin, more!
  • 28.
  • 29.
    Thank you Moon sooLee moon@nflabs.com moon@apache.org https://twitter.com/issuefreaks