Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
APACHE TOREE
ASIM JALIS
GALVANIZE
INTRO
ASIM JALIS
Galvanize/Zipfian, Data
Engineering
Cloudera, Microso!,
Salesforce
MS in Computer Science
from University of
Vi...
WHAT IS GALVANIZE’S DATA
ENGINEERING IMMERSIVE?
Immersive Peer Learning
Environment
Master High-Demand
Skills and Technolo...
YOU GET TO . . .
Play with Terabytes of
Data
Spark, Hadoop, Hive,
Kafka, Storm, HBase
Data Science at Scale
Level UP your ...
FOR MORE INFORMATION
asim.jalis@galvanize.com
http://galvanize.com
TALK OVERVIEW
WHAT IS THIS TALK ABOUT?
What is Apache Toree?
How can I create
IPython/Jupyter
notebooks for
Spark/Scala?
HOW MANY PEOPLE HERE ARE
FAMILIAR WITH
IPYTHON/JUPYTER NOTEBOOKS?
HOW MANY PEOPLE HERE ARE
FAMILIAR WITH APACHE SPARK?
HOW MANY PEOPLE HERE ARE
FAMILIAR WITH SCALA?
LITERATE PROGRAMMING
WHAT IS LITERATE
PROGRAMMING?
Proposed by Don Knuth
Write programs for
humans, not machines
Programs communicate
ideas to ...
LITERATE PROGRAM
This program prints hello world.
<<hello.c>>=
<<includes>>
<<main>>
@
Some includes.
<<includes>>=
#inclu...
WHAT PROBLEM DOES JUPYTER
SOLVE?
Suppose you want to share programming idea or tutorial
You write an article and embed cod...
HOW IS THIS DIFFERENT FROM
CODE COMMENTS?
Commented code is not technical literature
It cannot be published or read as an ...
JUPYTER/IPYTHON
WHAT IS JUPYTER?
Create executable
documents
Originally for Python
Supports other systems
through kernels
JUPYTER DEMO
Write Markdown text
Write Scala Spark code
Execute
Repeat
Tab-completion
JUPYTER ARCH
JUPYTER ARCH
Jupyter server
Displays notebook in
browser
Executes code on
Python runtime
Displays output back
into notebook
FERNANDO PÉREZ
IPython/Jupyter
inventor
Particle Physics PhD,
University of Colorado—
Boulder
Now at UC Berkeley
Started I...
IS IT IPYTHON OR JUPYTER?
Started out as IPython
notebook
Not specific to Python
Can work with Scala and
other languages
J...
APACHE TOREE
WHAT IS TOREE?
Toree is a Jupyter Kernel
Executes Scala
Runs Spark
Driver/Context
HOW DOES TOREE WORK?
TOREE ARCHITECTURE
Jupyter Server talks to
Toree Kernel
Toree Kernel talks to
Spark Driver
Spark Driver talks to
Spark Exe...
HOW IS TOREE DIFFERENT FROM
ZEPPELIN
Toree compliant with Jupyter protocol
Toree is easy to install and use
Zeppelin does ...
TOREE COMMITS
Lot of activity last year
Stabilizing
WHO WROTE TOREE?
Top 2 contributors responsible for 50% of commits
chipsenkbeil has 318 commits
Lull3rSkat3r has 72 commits
ROBERT “CHIP” SENKBEIL AND
COREY STUBBS
WHY IS IT CALLED TOREE?
Nothing special about the name. Some
people in the group and just picked it out.
Some facts though...
ACTUAL TORII
TOREE HANDS-ON DEMO
WHAT WE WILL COVER
How to install Toree
How to automatically pull Java libraries in your notebook
How to publish notebooks...
QUICKSTART TUTORIAL
For details on how to do all these things
See https://github.com/asimjalis/apache-toree-quickstart
CONCLUSION
REFERENCES
Apache Toree Quickstart Tutorial
https://github.com/asimjalis/apache-toree-quickstart
Apache Toree on GitHub
ht...
QUESTIONS
Apache Toree
Upcoming SlideShare
Loading in …5
×

Apache Toree

4,540 views

Published on

Apache Toree provides the interactive notebook for Spark/Scala. Toree is a IPython/Jupyter kernel. It lets you mix Spark/Scala code with markdown, execute the notebook, and publish it on the web.

Asim will talk about how to install and get started with Apache Toree, how to use it to develop Spark applications interactively in notebooks, and how to publish your notebooks.

Published in: Data & Analytics

Apache Toree

  1. 1. APACHE TOREE ASIM JALIS GALVANIZE
  2. 2. INTRO
  3. 3. ASIM JALIS Galvanize/Zipfian, Data Engineering Cloudera, Microso!, Salesforce MS in Computer Science from University of Virginia
  4. 4. WHAT IS GALVANIZE’S DATA ENGINEERING IMMERSIVE? Immersive Peer Learning Environment Master High-Demand Skills and Technologies Heart of San Francisco in SOMA
  5. 5. YOU GET TO . . . Play with Terabytes of Data Spark, Hadoop, Hive, Kafka, Storm, HBase Data Science at Scale Level UP your Career
  6. 6. FOR MORE INFORMATION asim.jalis@galvanize.com http://galvanize.com
  7. 7. TALK OVERVIEW
  8. 8. WHAT IS THIS TALK ABOUT? What is Apache Toree? How can I create IPython/Jupyter notebooks for Spark/Scala?
  9. 9. HOW MANY PEOPLE HERE ARE FAMILIAR WITH IPYTHON/JUPYTER NOTEBOOKS?
  10. 10. HOW MANY PEOPLE HERE ARE FAMILIAR WITH APACHE SPARK?
  11. 11. HOW MANY PEOPLE HERE ARE FAMILIAR WITH SCALA?
  12. 12. LITERATE PROGRAMMING
  13. 13. WHAT IS LITERATE PROGRAMMING? Proposed by Don Knuth Write programs for humans, not machines Programs communicate ideas to others Default text is documentation or thoughts Code is explicitly marked out
  14. 14. LITERATE PROGRAM This program prints hello world. <<hello.c>>= <<includes>> <<main>> @ Some includes. <<includes>>= #include <stdio.h> @ Print hello world, then exit. <<main>>= int main(int argc, char *argv[]) { printf("Hello World!n"); return 0; } @
  15. 15. WHAT PROBLEM DOES JUPYTER SOLVE? Suppose you want to share programming idea or tutorial You write an article and embed code in it Now imagine being able to execute the code in the article
  16. 16. HOW IS THIS DIFFERENT FROM CODE COMMENTS? Commented code is not technical literature It cannot be published or read as an article A literate program is an executable article
  17. 17. JUPYTER/IPYTHON
  18. 18. WHAT IS JUPYTER? Create executable documents Originally for Python Supports other systems through kernels
  19. 19. JUPYTER DEMO Write Markdown text Write Scala Spark code Execute Repeat Tab-completion
  20. 20. JUPYTER ARCH
  21. 21. JUPYTER ARCH Jupyter server Displays notebook in browser Executes code on Python runtime Displays output back into notebook
  22. 22. FERNANDO PÉREZ IPython/Jupyter inventor Particle Physics PhD, University of Colorado— Boulder Now at UC Berkeley Started IPython in 2001
  23. 23. IS IT IPYTHON OR JUPYTER? Started out as IPython notebook Not specific to Python Can work with Scala and other languages Jupyter captures its language independence
  24. 24. APACHE TOREE
  25. 25. WHAT IS TOREE? Toree is a Jupyter Kernel Executes Scala Runs Spark Driver/Context
  26. 26. HOW DOES TOREE WORK?
  27. 27. TOREE ARCHITECTURE Jupyter Server talks to Toree Kernel Toree Kernel talks to Spark Driver Spark Driver talks to Spark Executors
  28. 28. HOW IS TOREE DIFFERENT FROM ZEPPELIN Toree compliant with Jupyter protocol Toree is easy to install and use Zeppelin does not use Jupyter protocol Zeppelin wants to be a platform like Jupyter
  29. 29. TOREE COMMITS Lot of activity last year Stabilizing
  30. 30. WHO WROTE TOREE? Top 2 contributors responsible for 50% of commits chipsenkbeil has 318 commits Lull3rSkat3r has 72 commits
  31. 31. ROBERT “CHIP” SENKBEIL AND COREY STUBBS
  32. 32. WHY IS IT CALLED TOREE? Nothing special about the name. Some people in the group and just picked it out. Some facts though, it is a purposeful misspelling of the the Japanese word torii. A torrii is a traditional gate for Shinto shrines in Japan. —Corey Stubbs (personal email)
  33. 33. ACTUAL TORII
  34. 34. TOREE HANDS-ON DEMO
  35. 35. WHAT WE WILL COVER How to install Toree How to automatically pull Java libraries in your notebook How to publish notebooks on How to turn notebooks into slide shows http://nbviewer.jupyter.org/
  36. 36. QUICKSTART TUTORIAL For details on how to do all these things See https://github.com/asimjalis/apache-toree-quickstart
  37. 37. CONCLUSION
  38. 38. REFERENCES Apache Toree Quickstart Tutorial https://github.com/asimjalis/apache-toree-quickstart Apache Toree on GitHub https://github.com/apache/incubator-toree Apache Toree Home https://toree.incubator.apache.org/
  39. 39. QUESTIONS

×