The Beaker Notebook is a new open source tool for collaborative data science. Like IPython, Beaker uses a notebook-based metaphor for idea flow. However, Beaker was designed to be polyglot from the ground up. That is, a single notebook may contain cells from multiple different languages that communicate with one another through a unique feature called autotranslation. You can set a variable in a Python cell and then read that variable in a subsequent R cell, and everything just works – magically. Beaker comes with built-in support for Python, R, Groovy, Julia, and Javascript. In addition, Beaker also supports multiple kinds of cells for text, like HTML, LaTeX, Markdown, and our own visualization library that allows for the plotting of large data sets. This talk will motivate the design, review the architecture, and include a live demo of Beaker in action.
2. Beaker is a notebook-style development environment for
working interactively with complex datasets.
Its polyglot architecture allows you to switch between
languages or add new ones with ease.
10. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
11. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
12. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
13. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
14. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
15. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
16. server
client
HTML Plot Table …
Web Browser, HTML5, Angular, MVC
JSON
Doc
Model
nginx
Core Server
Jetty
Jersey
Jackson
Guice
Groovy
Python
IPython/ZMQ
R Groovy Python … Javascript
Evaluator Plugins
Output Plugins
R
Java/Rserve
…
33. Core Server // Groovy
beaker: { x: 10}
# Python
beaker.x + 1
// Groovy
beaker.x = 5+5
# Python
beaker.x + 1
10
Problem:
the request and
the reply are in
different threads
34. Core Server // Groovy
beaker: { x: 10}
# Python
beaker.x + 1
// Groovy
beaker.x = 5+5
# Python
beaker.x + 1
10 10
Problem:
the request and
the reply are in
different threads
Solution:
use a
java.util.concurrent.
SynchronousQueue
Two Sigma is technology firm focused on investment management. We use data science, machine learning, statistics, AI to build mathematical and algorithmic investment strategies. Beaker is a tool we built for our own quants, and we’ve decided to open source it, share it with the world for use by scientists of all kinds.
First, let’s talk about notebooks. Here are some scientific notebooks. Notebooks are for the part of science where you don’t know the answer yet. You are documenting experiments for yourself, collecting data, searching for a theory. A bit of a diary. Exploration.
What media types do you see here? (ask audience) prose, tables of numbers, graphs, equations, pictures.
Beaker to reimagines this with current technology vs graph paper and pen. in particular that means adding code to this mix.
the document then is a series of cells (paragraphs) each of which can be whatever media type, including code. code cells can be executed, and their results appear right in the notebook, below the code that created them.
an application that uses the notebook like this as the metaphor. those who have come before us: maple, mathematica, ipython. Closely related is literate programming (knuth 84).
for contrast, other UI metaphors: the REPL, spreadsheet, IDE (tree of text files).
this begs the big question, what language is the code?
there is no one solution to this question so we made it a variable.
that way, each part of your notebook can be written in the language which is best for expressing that idea.
or collaboration with someone who prefers another language.
ability to set a variable in one language and read it in another. Works for complex structures not just numbers or strings.
these variables are stored in the notebook, useful for a host of reasons (don’t have to move a bunch of data files with the notebook).
hub and spoke model, just a small connector for each language.
universal format they all connect to is the same as the format for the notebook itself: JSON.
Demo #1: Python, HTML, and JavaScript/d3 (interactive force directed graph).
Walk through and evaluate each cell. Show how you can see the data in beaker.graph.
Show how the data is saved in the notebook?
Demo #2: Interactive Charts, Levels of Detail, Sharing
On the price chart, Show how you can zoom and interact with the chart
Explain how this can cause problems with many points
Bring up next chart, explain sampling, show how it zooms
Show river/box option, discuss data-loss of sampling
Show one-click sharing to URL, it’s still interactive even
Demo #3: One more thing….
Since Beaker is polyglot and you can support multiple languages, there’s one situation that might be of interest to this audience: running python2 and python3 in the same notebook. That works and you can even use the autotranslation to pass data between them. In this example, we use the mechanize package, which is python2 only, to scrape a web site. We then pass the page to a Python3 page where we can parse the HTML and better handle unicode, make a histogram of tags. Finally we display the data as a bubble chart with d3.
Performance and Reliability
Improve autotranslation of complex types
Application Platform
Forms and Widgets
Move notebook model to server
Collaborative editing
Disconnected execution
Bunsen
ability to set a variable in one language and read it in another.
these variables are stored in the notebook, useful for a host of reasons (don’t have to move a bunch of data files with the notebook).
hub and spoke model, just a small connector for each language.
universal format they all connect to is the same as the format for the notebook itself: JSON.
Put a namespace in the notebook. These variables are the bus for autotranslation, also handy for eliminating dependencies on files that get separated from the notebook. Saved and shared with this data.
The challenge of implementing autotranslation is the backends are in separate processes, and the namespace is in the frontend, how do we connect it all?