SlideShare a Scribd company logo
1 of 45
Download to read offline
1
A tool for datascience at scale.
Matthias Bussonnier
(UC Berkeley mbussonnier@berkeley.edu)
Slides examples on GitHub:
https://github.com/Carreau/talks/tree/master/labtech-2015
Jupyter
A bit of History
The Notebook Application
The document format & Publication
Multi-User & scaling
The Ecosystem
2
The Lifecycle of a Scientific Idea
1. Individual exploratory work
2. Collaborative development
3. Parallel production runs (HPC, cloud, …)
4. Publication & communication (reproducibly!)
5. Education
6. Goto 1
3
The Lifecycle of a Scientific Idea
1. Individual exploratory work – Matlab command line
2. Collaborative development – email scripts back and forth ?
3. Parallel production runs (HPC, cloud, …) – rewrite Fortran/MPI
4. Publication & communication (reproducibly!) – Copy Past in PPT
5. Education – Specific tools
6. Goto 1
4
Can we have a single tool that cover all the lifecycle of
a scientific idea, from data collection to publication ?
5
A bit of History
Fernando Perez, 2001, CU Boulder (instead of
writing a physics dissertation):
Python can replace the collection of bash,
perl, C/C++ Script. But the Python REPL
can be better.
6
NOVEMBER 2001: "JUST AN AFTERNOON HACK"
259 Line Python script. (https://gist.github.com/fperez/1579699)
sys.ps1 -> In [N].
sys.displayhook -> Out[N], caches results.
Plotting, Numeric, etc.
2014 (OPENHUB STATS)
19,279 commits
442 contributors
Total Lines: 187,326
Number of Languages : 7 (JS, CSS, HTML, …)
7
Improve over the terminal
❖ The REPL as a network protocol
❖ Kernels
❖ execute code
❖ Clients
❖ Read input
❖ Present output
Simple abstractions enable rich,
sophisticated clients 8
❖ Rich web client
❖ Text & math
❖ Code
❖ Results
❖ Share, reproduce.
2011: The IPython Notebook
9
The Team
(people that spend a noticeable amount of time on the project, subjective of course)
Fernando Perez (UC Berkeley LBL)
Brian Granger (CalPoly)
Oberon Lopez (summer student)
Cameron Oelsen (summer student)
SimonVurens (summer student)
Ryan Morshed (summer student)
Min Ragan-Kelley (Simula)
Thomas Kluyver (UK)
Matthias Bussonnier (UC Berkeley)
Jon Frederic (Cal Poly)
Jess Hamrick (UC Berkeley)
Kyle Kelley (Rackspace)
Jason Grout (Bloomberg)
Sylvain Corlay (Bloomberg)
Kester Tong (Google)
Nicholas Bollweg
Will Whitney (MIT)
Damián Avila (Continuum)
Steven Silvester (Continuum)
Chris Colbert (Continuum)
David Willmer (Continuum)
Peter Parente (IBM)
Dan Gisolfi (IBM)
Gino Bustelo (IBM)
All 400+ GitHub contributors.
Bold:Working full time on IPython/Jupyter, underline: Contribute to Jupyter/IPython with corporate agreement
10
Funding
11
Jupyter vs IPython
Network protocol for interactive
computing
Clients for protocol
Console
Qt Console
Notebook
Notebook file format & tools
(nbconvert…)
JupyterHub
Nbviewer
NbGrader
Tmpnb
…
Interactive Python shell at the
terminal
Kernel for this protocol in Python
Tools for Cross-Language
integration
Tools for Interactive Parallel
computing
The “reference” kernel for
Jupyter
12
Why ?
Don’t reinvent the wheel: reimplement 1 piece, get the rest for
free.
You don’t like the frontend, write a new one for Python get
50+ languages that work out of the box with it. (https://github.com/
ipython/ipython/wiki/IPython-kernels-for-other-languages)
You don’t like a language, write your own kernel, get all the
IDEs, conversion tools.
Etc..
13
The Notebook
Try it on https://try.jupyter.org
Demo
Notebook app, also have a terminal, text editor, increasing
number of plugins, and of course support 50 languages.
14
The notebook
Web Application, that allow code to produce
web-rich representation (images, sound, video,
math, …)
The Browser, Server, and kernel(s) can be on
separate machines.
The default application to edit `.ipynb` files.
`.ipynb` file are JSON based files embeding
input and output, so which can be read &
converted without a running kernel.
15
The Notebook Fileformat (`.ipynb`)
16
NbViewer
Zero-install reading of
notebooks
Just share a URL
nbviewer.org
Under the hood: get raw URL
and convert to HTML on the fly.
Sharing:
git push, or dropbox sync.
17
Nbviewer on GitHub
Since May GitHub renders
Notebooks
Powered by `nbconvert`,
the library that deals with
`.ipynb` -> *
Over 200,000 notebooks
on GitHub
18
What content as notebook ?
http://www.nature.com/ismej/journal/v7/n3/full/
ismej2012123a.html
http://qiime.org/home_static/nih-cloud-apr2012
Papers with code as AMI/VMs
19
Blogs
Jake van der Plas @ UW
http://blogs.scientificamerican.com/
sa-visual/2014/09/16/visualizing-4-
dimensional-asteroids
20
Course, MOOCS
21
Books
By Cameron Davidson-PilonBy Matthew Russell By José Unpingco
You can download and execute the books locally.
22
Check The Gallery
https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks23
Replicating, simpler for readers
http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261
What if you didn’t
had to install anything ?
Docker Container
Just for you.
24
Replicating, simpler for authors
1. Fixed set of notebooks/dependencies
1. Tmpnb, (https://lambdaops.com/ipythonjupyter-tmpnb-debuts/)
2. CodeNeuro, (http://codeneuro.org/)
2. Build on demand
1. Everware, (https://github.com/everware)
2. Binder, (MyBinder.org)
25
https://github.com/binder-project
by Jeremy Freeman
(Demo mybinder.org )
26
The networking architecture
(single user)
Https/
websocket
ZMQ
27
MULTI-USER
Jupyter Notebook is Single-User by design.
Multi-User enable through JupyterHub
- Allow Better scalability
- resources monitoring/user
- Per-user configuration/version of IPython
- Better integration with existing infrastructure
28
Hub
Https/websocket proxy
Auth
& Security
Hub
29
Everything* is a plugin
• Auth:
• Unix PAM (default), OAuth, LDAP… (ie, not yet another thing to manage)
• Spawner - Start a single user server for each user.
• Localhost, Rackspace, EC2, Docker
• Meant for sysadmin,
• Deployments relatively involved
• Recent software stack (Node.js/Python3)
Hub
30
Quick Demo
31
JupyterHub in education
Jess Hamrick @ Cal
K. Kelley
Rackspace
M. Ragan-Kelley
Simula
B. Granger
Cal Poly
https://developer.rackspace.com/blog/deploying-jupyterhub-for-education
❖ Computationally intensive course, ~220 students
❖ Fully hosted environment, zero-install
❖ Integration with autograding.
32
Deploy at larger scale this fall at UC Berkeley
- Data Science 101
- Everyone with CalNet account.
New Jupyter in Education Mailing List:
https://groups.google.com/forum/#!forum/jupyter-education
33
Ecosystem
34
Non-Notebook
projects
IDE/Frontends:
Atom Hydrogen
EIN
VIM IPython
Rodeo
PyCharm
MicrosoftVisual Studio
35
Non-Notebook
projects
RISE (interactive slideshow)
runipy (notebooks are report templates)
ipymd (store notebook as markdown)
NbGrader (grade assignments)
Jupyter-Drive (store notebooks on G-drive)
pgcontents (store notebooks on PostGres)
urth (declarative widget, + dashboard from notebook)
36
Google CoLaboratory
Kayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google
Matt Turk @ NCSA/UIUC
Currently being merge
into Jupyter itself.
37
O’Reilly: authoring and delivering executable books
Atlas, ipymd and Thebe
beta.oreilly.com
38
The Future
39
40
Future work
Interactive Computing
Notebooks as interactive applications
Modular, reusable UI/UX
Software engineering with notebooks
Computational Narratives
nbconvert
Element filtering
Documentation
Collaboration
Real time collaboration
JupyterHub
Sustainability
People
Events
41
Future work
Component and tiled-layout
are oft requested feature.
Collaboration with
Continuum Analytics.
Plan on adding panels for
Text editor, output, variable
inspectors, debuggers, …
Discussion with Microsoft
PTVS team for “debugger
protocol”
42
43
Hiring
At UC Berkeley
Two new postdocs
Project manager
Web developer, tech writer (short contracts)
One administrative assistant.
At Cal Poly
Three software engineers (one already hired)
One designer
One administrative assistant.
44
Time for questions ?
Thanks
45

More Related Content

What's hot

Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python amiable_indian
 
Python - An Introduction
Python - An IntroductionPython - An Introduction
Python - An IntroductionSwarit Wadhe
 
Python Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | EdurekaPython Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | EdurekaEdureka!
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in PythonJagriti Goswami
 
Introduction to Python Programing
Introduction to Python ProgramingIntroduction to Python Programing
Introduction to Python Programingsameer patil
 
Python Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaPython Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaEdureka!
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)PyData
 
Python, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for EngineersPython, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for EngineersBoey Pak Cheong
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaEdureka!
 
Python and its Applications
Python and its ApplicationsPython and its Applications
Python and its ApplicationsAbhijeet Singh
 

What's hot (20)

TensorFlow
TensorFlowTensorFlow
TensorFlow
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Python basic
Python basicPython basic
Python basic
 
Intro to Jupyter Notebooks
Intro to Jupyter NotebooksIntro to Jupyter Notebooks
Intro to Jupyter Notebooks
 
NUMPY
NUMPY NUMPY
NUMPY
 
Python - An Introduction
Python - An IntroductionPython - An Introduction
Python - An Introduction
 
Python Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | EdurekaPython Basics | Python Tutorial | Edureka
Python Basics | Python Tutorial | Edureka
 
Data Visualization in Python
Data Visualization in PythonData Visualization in Python
Data Visualization in Python
 
Beginning Python Programming
Beginning Python ProgrammingBeginning Python Programming
Beginning Python Programming
 
Python ppt
Python pptPython ppt
Python ppt
 
Introduction to Python Programing
Introduction to Python ProgramingIntroduction to Python Programing
Introduction to Python Programing
 
Python Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaPython Anaconda Tutorial | Edureka
Python Anaconda Tutorial | Edureka
 
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)Introduction to NumPy (PyData SV 2013)
Introduction to NumPy (PyData SV 2013)
 
PyTorch under the hood
PyTorch under the hoodPyTorch under the hood
PyTorch under the hood
 
Pandas
PandasPandas
Pandas
 
Python, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for EngineersPython, the Language of Science and Engineering for Engineers
Python, the Language of Science and Engineering for Engineers
 
Python libraries
Python librariesPython libraries
Python libraries
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | Edureka
 
Python and its Applications
Python and its ApplicationsPython and its Applications
Python and its Applications
 

Viewers also liked

Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorLevi Waldron
 
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformAnalytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformRising Media Ltd.
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems BiologyMike Hucka
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and BioinformaticsSharif Shuvo
 
Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102remko caprio
 
MongoDB - Big Data mit Open Source
MongoDB - Big Data mit Open SourceMongoDB - Big Data mit Open Source
MongoDB - Big Data mit Open SourceB1 Systems GmbH
 
The Computer Scientist and the Cleaner v4
The Computer Scientist and the Cleaner v4The Computer Scientist and the Cleaner v4
The Computer Scientist and the Cleaner v4turingfan
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Harald Erb
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelLars Juhl Jensen
 
DNA Information and Creation (PDF)
DNA Information and Creation (PDF)DNA Information and Creation (PDF)
DNA Information and Creation (PDF)Hans Rudolf Tremp
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big DataWilfried Hoge
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemLars Juhl Jensen
 
Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?IoT User Group Hamburg
 
Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Annex Publishers
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopVivek Krishnakumar
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its toolsGaurav Diwakar
 

Viewers also liked (20)

LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
Multi-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/BioconductorMulti-omics infrastructure and data for R/Bioconductor
Multi-omics infrastructure and data for R/Bioconductor
 
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformAnalytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
Analytics meets Big Data – R/Python auf der Hadoop/Spark-Plattform
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102Apps for Science - Elsevier Developer Network Workshop 201102
Apps for Science - Elsevier Developer Network Workshop 201102
 
COMPUTATIONAL BIOLOGY
COMPUTATIONAL BIOLOGYCOMPUTATIONAL BIOLOGY
COMPUTATIONAL BIOLOGY
 
MongoDB - Big Data mit Open Source
MongoDB - Big Data mit Open SourceMongoDB - Big Data mit Open Source
MongoDB - Big Data mit Open Source
 
The Computer Scientist and the Cleaner v4
The Computer Scientist and the Cleaner v4The Computer Scientist and the Cleaner v4
The Computer Scientist and the Cleaner v4
 
Donald Knuth
Donald KnuthDonald Knuth
Donald Knuth
 
Zwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
Zwischen Browser, Code & Photoshop - aus dem Leben eines WebworkersZwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
Zwischen Browser, Code & Photoshop - aus dem Leben eines Webworkers
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
DNA Information and Creation (PDF)
DNA Information and Creation (PDF)DNA Information and Creation (PDF)
DNA Information and Creation (PDF)
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big Data
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?Data Scientist - The Sexiest Job of the 21st Century?
Data Scientist - The Sexiest Job of the 21st Century?
 
Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)Computational Systems Biology (JCSB)
Computational Systems Biology (JCSB)
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 

Similar to Jupyter, A Platform for Data Science at Scale

Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Domino Data Lab
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)PyData
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoEGI Federation
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1inside-BigData.com
 
Behold the Power of Python
Behold the Power of PythonBehold the Power of Python
Behold the Power of PythonSarah Dutkiewicz
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsHenry Schreiner
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilChristian Frech
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net DeveloperSarah Dutkiewicz
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptxFEG
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET DeveloperSarah Dutkiewicz
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Carlos Miguel Ferreira
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
Do you know all of Puppet?
Do you know all of Puppet?Do you know all of Puppet?
Do you know all of Puppet?Julien Pivotto
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Codemotion
 

Similar to Jupyter, A Platform for Data Science at Scale (20)

Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1Introduction to EasyBuild: Tutorial Part 1
Introduction to EasyBuild: Tutorial Part 1
 
Behold the Power of Python
Behold the Power of PythonBehold the Power of Python
Behold the Power of Python
 
PyCon2022 - Building Python Extensions
PyCon2022 - Building Python ExtensionsPyCon2022 - Building Python Extensions
PyCon2022 - Building Python Extensions
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Python 101 For The Net Developer
Python 101 For The Net DeveloperPython 101 For The Net Developer
Python 101 For The Net Developer
 
1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx1_International_Google_CoLab_20220307.pptx
1_International_Google_CoLab_20220307.pptx
 
Python 101 for the .NET Developer
Python 101 for the .NET DeveloperPython 101 for the .NET Developer
Python 101 for the .NET Developer
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.Python 3.5: An agile, general-purpose development language.
Python 3.5: An agile, general-purpose development language.
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Do you know all of Puppet?
Do you know all of Puppet?Do you know all of Puppet?
Do you know all of Puppet?
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
 
London level39
London level39London level39
London level39
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Jupyter, A Platform for Data Science at Scale

  • 1. 1 A tool for datascience at scale. Matthias Bussonnier (UC Berkeley mbussonnier@berkeley.edu) Slides examples on GitHub: https://github.com/Carreau/talks/tree/master/labtech-2015
  • 2. Jupyter A bit of History The Notebook Application The document format & Publication Multi-User & scaling The Ecosystem 2
  • 3. The Lifecycle of a Scientific Idea 1. Individual exploratory work 2. Collaborative development 3. Parallel production runs (HPC, cloud, …) 4. Publication & communication (reproducibly!) 5. Education 6. Goto 1 3
  • 4. The Lifecycle of a Scientific Idea 1. Individual exploratory work – Matlab command line 2. Collaborative development – email scripts back and forth ? 3. Parallel production runs (HPC, cloud, …) – rewrite Fortran/MPI 4. Publication & communication (reproducibly!) – Copy Past in PPT 5. Education – Specific tools 6. Goto 1 4
  • 5. Can we have a single tool that cover all the lifecycle of a scientific idea, from data collection to publication ? 5
  • 6. A bit of History Fernando Perez, 2001, CU Boulder (instead of writing a physics dissertation): Python can replace the collection of bash, perl, C/C++ Script. But the Python REPL can be better. 6
  • 7. NOVEMBER 2001: "JUST AN AFTERNOON HACK" 259 Line Python script. (https://gist.github.com/fperez/1579699) sys.ps1 -> In [N]. sys.displayhook -> Out[N], caches results. Plotting, Numeric, etc. 2014 (OPENHUB STATS) 19,279 commits 442 contributors Total Lines: 187,326 Number of Languages : 7 (JS, CSS, HTML, …) 7
  • 8. Improve over the terminal ❖ The REPL as a network protocol ❖ Kernels ❖ execute code ❖ Clients ❖ Read input ❖ Present output Simple abstractions enable rich, sophisticated clients 8
  • 9. ❖ Rich web client ❖ Text & math ❖ Code ❖ Results ❖ Share, reproduce. 2011: The IPython Notebook 9
  • 10. The Team (people that spend a noticeable amount of time on the project, subjective of course) Fernando Perez (UC Berkeley LBL) Brian Granger (CalPoly) Oberon Lopez (summer student) Cameron Oelsen (summer student) SimonVurens (summer student) Ryan Morshed (summer student) Min Ragan-Kelley (Simula) Thomas Kluyver (UK) Matthias Bussonnier (UC Berkeley) Jon Frederic (Cal Poly) Jess Hamrick (UC Berkeley) Kyle Kelley (Rackspace) Jason Grout (Bloomberg) Sylvain Corlay (Bloomberg) Kester Tong (Google) Nicholas Bollweg Will Whitney (MIT) Damián Avila (Continuum) Steven Silvester (Continuum) Chris Colbert (Continuum) David Willmer (Continuum) Peter Parente (IBM) Dan Gisolfi (IBM) Gino Bustelo (IBM) All 400+ GitHub contributors. Bold:Working full time on IPython/Jupyter, underline: Contribute to Jupyter/IPython with corporate agreement 10
  • 12. Jupyter vs IPython Network protocol for interactive computing Clients for protocol Console Qt Console Notebook Notebook file format & tools (nbconvert…) JupyterHub Nbviewer NbGrader Tmpnb … Interactive Python shell at the terminal Kernel for this protocol in Python Tools for Cross-Language integration Tools for Interactive Parallel computing The “reference” kernel for Jupyter 12
  • 13. Why ? Don’t reinvent the wheel: reimplement 1 piece, get the rest for free. You don’t like the frontend, write a new one for Python get 50+ languages that work out of the box with it. (https://github.com/ ipython/ipython/wiki/IPython-kernels-for-other-languages) You don’t like a language, write your own kernel, get all the IDEs, conversion tools. Etc.. 13
  • 14. The Notebook Try it on https://try.jupyter.org Demo Notebook app, also have a terminal, text editor, increasing number of plugins, and of course support 50 languages. 14
  • 15. The notebook Web Application, that allow code to produce web-rich representation (images, sound, video, math, …) The Browser, Server, and kernel(s) can be on separate machines. The default application to edit `.ipynb` files. `.ipynb` file are JSON based files embeding input and output, so which can be read & converted without a running kernel. 15
  • 16. The Notebook Fileformat (`.ipynb`) 16
  • 17. NbViewer Zero-install reading of notebooks Just share a URL nbviewer.org Under the hood: get raw URL and convert to HTML on the fly. Sharing: git push, or dropbox sync. 17
  • 18. Nbviewer on GitHub Since May GitHub renders Notebooks Powered by `nbconvert`, the library that deals with `.ipynb` -> * Over 200,000 notebooks on GitHub 18
  • 19. What content as notebook ? http://www.nature.com/ismej/journal/v7/n3/full/ ismej2012123a.html http://qiime.org/home_static/nih-cloud-apr2012 Papers with code as AMI/VMs 19
  • 20. Blogs Jake van der Plas @ UW http://blogs.scientificamerican.com/ sa-visual/2014/09/16/visualizing-4- dimensional-asteroids 20
  • 22. Books By Cameron Davidson-PilonBy Matthew Russell By José Unpingco You can download and execute the books locally. 22
  • 24. Replicating, simpler for readers http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261 What if you didn’t had to install anything ? Docker Container Just for you. 24
  • 25. Replicating, simpler for authors 1. Fixed set of notebooks/dependencies 1. Tmpnb, (https://lambdaops.com/ipythonjupyter-tmpnb-debuts/) 2. CodeNeuro, (http://codeneuro.org/) 2. Build on demand 1. Everware, (https://github.com/everware) 2. Binder, (MyBinder.org) 25
  • 27. The networking architecture (single user) Https/ websocket ZMQ 27
  • 28. MULTI-USER Jupyter Notebook is Single-User by design. Multi-User enable through JupyterHub - Allow Better scalability - resources monitoring/user - Per-user configuration/version of IPython - Better integration with existing infrastructure 28
  • 30. Everything* is a plugin • Auth: • Unix PAM (default), OAuth, LDAP… (ie, not yet another thing to manage) • Spawner - Start a single user server for each user. • Localhost, Rackspace, EC2, Docker • Meant for sysadmin, • Deployments relatively involved • Recent software stack (Node.js/Python3) Hub 30
  • 32. JupyterHub in education Jess Hamrick @ Cal K. Kelley Rackspace M. Ragan-Kelley Simula B. Granger Cal Poly https://developer.rackspace.com/blog/deploying-jupyterhub-for-education ❖ Computationally intensive course, ~220 students ❖ Fully hosted environment, zero-install ❖ Integration with autograding. 32
  • 33. Deploy at larger scale this fall at UC Berkeley - Data Science 101 - Everyone with CalNet account. New Jupyter in Education Mailing List: https://groups.google.com/forum/#!forum/jupyter-education 33
  • 36. Non-Notebook projects RISE (interactive slideshow) runipy (notebooks are report templates) ipymd (store notebook as markdown) NbGrader (grade assignments) Jupyter-Drive (store notebooks on G-drive) pgcontents (store notebooks on PostGres) urth (declarative widget, + dashboard from notebook) 36
  • 37. Google CoLaboratory Kayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google Matt Turk @ NCSA/UIUC Currently being merge into Jupyter itself. 37
  • 38. O’Reilly: authoring and delivering executable books Atlas, ipymd and Thebe beta.oreilly.com 38
  • 40. 40
  • 41. Future work Interactive Computing Notebooks as interactive applications Modular, reusable UI/UX Software engineering with notebooks Computational Narratives nbconvert Element filtering Documentation Collaboration Real time collaboration JupyterHub Sustainability People Events 41
  • 42. Future work Component and tiled-layout are oft requested feature. Collaboration with Continuum Analytics. Plan on adding panels for Text editor, output, variable inspectors, debuggers, … Discussion with Microsoft PTVS team for “debugger protocol” 42
  • 43. 43
  • 44. Hiring At UC Berkeley Two new postdocs Project manager Web developer, tech writer (short contracts) One administrative assistant. At Cal Poly Three software engineers (one already hired) One designer One administrative assistant. 44
  • 45. Time for questions ? Thanks 45