Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Computable Content with
Jupyter, Docker, Mesos
Strata+HW Singapore

2016-12-07
Paco Nathan, @pacoid

Director, Learning Gr...
Project Jupyter
3
Project Jupyter is the evolution of iPython notebooks,
applied to a range of different programming languages
and environ...
4
Download Anaconda:
continuum.io/downloads
Activate the environment needed:
source activate py3k
Launch Juypter:
jupyter ...
5
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoe...
7
At its core, one can think of Jupyter as a suite 

of network protocols:
Jupyter is to the remote semantics of a REPL

a...
8
An excellent team
9
JupyterHub
github.com/jupyterhub/jupyterhub
Jupyter in Education
groups.google.com/forum/#!forum/jupyter-education
Jupyt...
10
documentation
jupyter.readthedocs.io/en/latest/index.html
discussions
groups.google.com/forum/#!forum/jupyter
gitter.im...
11
speaking of upcoming events, stay tuned for …
JupyterCon
Resources:
Computable Content
13
An observation…
14
Jupyter @ O’Reilly Media
Embracing Jupyter Notebooks at O'Reilly

oreilly.com/ideas/jupyter-at-oreilly
Learn alongside ...
15
For example…
• A unique new medium blends code,
data, text, and video into a narrated
learning experience with computab...
16
Motivations
O’Reilly needed a way for authors to use Jupyter notebooks to create
professional publications. We also wan...
17
Motivations
Innovators in programming, data science, dev ops, design, etc., tend to
be really busy people. Tutorials ar...
18
A notebook, a container, and ~20 minutes of
informal video walk into a bar...
19
Literate Programming, Don Knuth

literateprogramming.com/
Paraphrased:
Instead of telling computers what to do, tell ot...
20
Wolfram Research introduced notebooks in 1988 

for working with Mathematica…
Some history
21
PyCon 2016 Keynote, Lorena Barba
youtu.be/ckW1xuGVpug?t=35m11s (video)
figshare.com/articles/PyCon2016_Keynote/3407779 (...
Notebook Practice
23
• focus on a concise “unit of thought”
• invest the time and editorial effort to create a good intro
• keep your narrat...
24
Jupyter notebooks + Git repos provide a low-cost,
pragmatic way toward the practice of repeatable
science – in this cas...
25
In data science, we see the benefits to teams for shared
insights, storytelling, etc.
Meanwhile domain expertise is gen...
Authoring & Scale-Out
27
Launchbot.io
28
Launchbot allows a notebook author to build a
container that includes the required Jupyter kernel,
installed libraries,...
presenter:
Just Enough Math
O’Reilly (2014)
justenoughmath.com
monthly newsletter for updates, 

events, conf summaries, e...
Computable Content with Jupyter Docker Mesos
Upcoming SlideShare
Loading in …5
×

Computable Content with Jupyter Docker Mesos

5,377 views

Published on

(presented at Big Data Spain 2016 and at Strata+HW Singapore 2016)

Project Jupyter is the evolution of iPython notebooks, applied to a range of different programming languages and environments. If you have not worked with Jupyter notebooks yet, here is a quick hands-on introduction. If you have already, this tutorial will also explore how Jupyter and Docker used together provide what Prof. Lorena Barba has called "Computable Content".

We will work through brief exercises that show how to use Jupyter notebooks, based an example application for natural language processing in Python. We will use Launchbot.io for preparing containers and notebooks locally. In other words, editing on a laptop prior to working at scale using Mesos or other cluster managers. We will walk through the system architecture used at O'Reilly Media to combine Apache Mesos, Marathon, Docker, and Jupyter. Then we will take in-depth look at how Jupyter is being used in industry, and consider its impact on data science, software engineering, and science in academia.

Published in: Education
  • Be the first to comment

Computable Content with Jupyter Docker Mesos

  1. 1. Computable Content with Jupyter, Docker, Mesos Strata+HW Singapore
 2016-12-07 Paco Nathan, @pacoid
 Director, Learning Group @ O’Reilly Media 1
  2. 2. Project Jupyter
  3. 3. 3 Project Jupyter is the evolution of iPython notebooks, applied to a range of different programming languages and environments https://jupyter.org/ https://github.com/ipython/ipython/wiki/IPython- kernels-for-other-languages Some history…
  4. 4. 4 Download Anaconda: continuum.io/downloads Activate the environment needed: source activate py3k Launch Juypter: jupyter notebook An example notebook (requires installs; see notes): github.com/ceteri/oriole_jupyterday_atl/blob/master/example.ipynb Installation and launch using Anaconda
  5. 5. 5 text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' from textblob import TextBlob blob = TextBlob(text) print(blob.tags) print(blob.noun_phrases) Installation and launch using Anaconda
  6. 6. 7 At its core, one can think of Jupyter as a suite 
 of network protocols: Jupyter is to the remote semantics of a REPL
 as…
 HTTP is to the remote semantics of file share A suite of network protocols
  7. 7. 8 An excellent team
  8. 8. 9 JupyterHub github.com/jupyterhub/jupyterhub Jupyter in Education groups.google.com/forum/#!forum/jupyter-education JupyterLab (alpha preview) github.com/jupyterlab/jupyterlab Jupyter Kernels github.com/ipython/ipython/wiki/IPython-kernels-for-other-languages Projects:
  9. 9. 10 documentation jupyter.readthedocs.io/en/latest/index.html discussions groups.google.com/forum/#!forum/jupyter gitter.im/jupyter/jupyter events calendar.google.com/calendar/embed? src=p51j0ac1iccmj44tae12hq4dk0%40group.calendar.google.com Resources:
  10. 10. 11 speaking of upcoming events, stay tuned for … JupyterCon Resources:
  11. 11. Computable Content
  12. 12. 13 An observation…
  13. 13. 14 Jupyter @ O’Reilly Media Embracing Jupyter Notebooks at O'Reilly
 oreilly.com/ideas/jupyter-at-oreilly Learn alongside innovators, thought-by-thought, in context
 oreilly.com/ideas/oreilly-oriole-learn-alongside-innovators- thought-by-thought-in-context Oriole Online Tutorials
 safaribooksonline.com/oriole/ How Do You Learn? oreilly.com/learning/how-do-you-learn
  14. 14. 15 For example… • A unique new medium blends code, data, text, and video into a narrated learning experience with computable content • Purely browser-based UX; zero installation required • Substantially higher engagement metrics • Opens the door for live coding 
 in assessments • GitHub lists over 300K public 
 Jupyter notebooks Regex Golf by Peter Norvig
 oreilly.com/learning/regex-golf- with-peter-norvig
  15. 15. 16 Motivations O’Reilly needed a way for authors to use Jupyter notebooks to create professional publications. We also wanted to integrate video narration into the UX. The result is a unique new medium called Oriole: • Jupyter notebooks are used in the middleware • each viewer gets a 100% HTML experience 
 (no download/install needed) • context as a “unit of thought” • the code and video are sync’ed together • each web session has a Docker container running in the cloud
  16. 16. 17 Motivations Innovators in programming, data science, dev ops, design, etc., tend to be really busy people. Tutorials are now much quicker to publish than “traditional” books and videos. The audience gets direct, hands-on, contextualized experience across a wide variety of programming environments.
  17. 17. 18 A notebook, a container, and ~20 minutes of informal video walk into a bar...
  18. 18. 19 Literate Programming, Don Knuth
 literateprogramming.com/ Paraphrased: Instead of telling computers what to do, tell other people what you want the computers to do Some history
  19. 19. 20 Wolfram Research introduced notebooks in 1988 
 for working with Mathematica… Some history
  20. 20. 21 PyCon 2016 Keynote, Lorena Barba youtu.be/ckW1xuGVpug?t=35m11s (video) figshare.com/articles/PyCon2016_Keynote/3407779 (slides) Highly recommended: speech acts (based 
 on Winograd and Flores) as theory for what 
 we’re doing here More recently
  21. 21. Notebook Practice
  22. 22. 23 • focus on a concise “unit of thought” • invest the time and editorial effort to create a good intro • keep your narrative simple and reasonably linear • “chunk” the text and code into understandable parts • alternate between text, code, output, further links, etc. • use markdown for interesting links: background, deep-dive, etc. • code cells shouldn’t be long (< 10 lines), must show output • load data+libraries from the container, not the network • clear all output then “Run All” – or it didn’t happen • video narratives: there’s text, and there’s subtext... • pause after each “beat” – smile, breathe, let people follow you Tips learned by teaching with Jupyter For the JVM people: stop thinking only about IDEs, Ivy, Maven, etc. (ibid, Knuth1984)
 BUILD UBER JARS, LOAD LIBS FROM CONTAINER, NOT THE NETWORK!
 (apologies for shouting)
  23. 23. 24 Jupyter notebooks + Git repos provide a low-cost, pragmatic way toward the practice of repeatable science – in this case, repeatable Data Science • executable documents • code + params + results + descriptions • shareable insights Notebooks: a cure for silos
  24. 24. 25 In data science, we see the benefits to teams for shared insights, storytelling, etc. Meanwhile domain expertise is generally more important than knowledge about tools There’s a value for developers to use notebooks in lieu of IDEs in some cases – what are those cases? GitHub now renders notebooks, so they can be used for documentation, reporting, etc. Digital Object Identifiers (DOI) can be assigned through Zenodo, making notebooks citable for academic publication “Sharing is caring”
  25. 25. Authoring & Scale-Out
  26. 26. 27 Launchbot.io
  27. 27. 28 Launchbot allows a notebook author to build a container that includes the required Jupyter kernel, installed libraries, datasets, etc. You need to have Docker installed on your laptop The backend uses Git and DockerHub to manage containers For scale, deploy to DC/OS Achieving scale
  28. 28. presenter: Just Enough Math O’Reilly (2014) justenoughmath.com monthly newsletter for updates, 
 events, conf summaries, etc.: liber118.com/pxn/
 @pacoid

×