For
Interactive
Data Science
Collaboration
CineGrid December 10, 2015
HELLO
CAROL WILLING
➤ Python Software Foundation,
Director
➤ Project Jupyter, Contributor
➤ Fab Lab San Diego, Geek in
Residence
WRITER
MANAGER
AND
ANALYST
ENGINEER
ARTIST
TEACHER
WONDER
AND
CURIOSITY
PROJECT JUPYTER
Just the Facts
JUPYTER NOTEBOOK
The Notebook: “Literate Computing”
Computational Narratives
❖ Computers deal with code and data.
❖ Humans deal with narratives that communicate.
Literate Computing (not Literate Programming)
narratives anchored in a live computation, that
communicate a story based on data and results.
Cf: Mathematica, Maple, MuPad, Sage…
“Project Jupyter serves not only the
academic and scientific communities
but also a much broader constituency
of data scientists in research,
education, industry and journalism…
- Fernando Pérez
UC Berkeley
“…we see uses of our tools that range
from high school education in
programming to the nation’s
supercomputing facilities and the
leaders of the tech industry.
- Fernando Pérez
UC Berkeley
“More than a million people are
currently using Jupyter for everything
from…
-Prof. Brian Granger
Cal Poly
“…analyzing massive gene sequencing
datasets to processing images from
the Hubble Space Telescope and
developing models of financial
markets.
-Prof. Brian Granger
Cal Poly
“We are excited by the potential of
Project Jupyter to reach even wider
audiences and to contribute to
increased cross-disciplinary
collaboration in the sciences.
-Betsy Fader
Helmsley Charitable Trust
“Jupyter Notebook… will enable data
exploration, visualization, and
analysis in a way that encourages
sound science and speeds progress.
-Chris Mentzel
The Gordon and Betty Moore Foundation
DATA CHALLENGES
Constraints or Opportunities?
SCALE
SPEED
CHOICES
CONNECTIONS
OPPORTUNITIES
Use our strengths
–Hamming'62
“The purpose of computing is insight,
not numbers”
The Lifecycle of a Scientific Idea (schematically)
1. Individual exploratory work
2. Collaborative development
3. Parallel production runs (HPC, cloud, ...)
4. Publication & communication (reproducibly!)
5. Education
6. Goto 1.
JUPYTERHUB
and Project Jupyter ecosystem
EDUCATION
nbviewer: seamless notebook sharing
❖ Zero-install reading of
notebooks
❖ Just share a URL
❖ nbviewer.ipython.org
Executable books
❖ Springer hardcover book
❖ Chapters: IPython Notebooks
❖ Posted as a blog entry
❖ All available as a Github repo
Python for Signal Processing, by José Unpingco
University Courses
These are just some we are aware of!
A collaborative MOOC on OpenEdX
http://lorenabarba.com/news/announcing-practical-numerical-methods-with-python-mooc
❖ Lorena Barba at George Washington
University, USA.
❖ Ian Hawke at Southampton, UK
❖ Carlos Jerez at Pontifical Catholic
University of Chile.
❖ All materials on Gihtub.
Changing the scientific culture
http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261
Executable papers: the future?
http://www.nature.com/news/ipython-interactive-demo-7.21492?article=1.16261
Notebook Workflows: The Big Picture
Image credit: Joshua Barratt
Lots more! The IPython Gallery
https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks
GOVERNMENT
Shreyas Cholia & !
Oliver Ruebel!
NERSC Data & Analytics Services Group!
Jupyterhub Day, July 17 2015
Jupyterhub at
NERSC and
OpenMSI
NERSC is the Production HPC & Data Facility
for DOE Office of Science Research
Bio$Energy,$$Environment$ Compu2ng$ Materials,$Chemistry,$$
Geophysics$
Par2cle$Physics,$
Astrophysics$
Largest$funder$of$physical$
science$research$in$U.S.$$
Nuclear$Physics$ Fusion$Energy,$
Plasma$Physics$
D$2$D$
ART
BUSINESS
Quantopian: algorithmic trading
Karen Rubin
Dir. Product Management
at Quantopian
Quantopian Research Post Fortune.com
Microsoft: Python Tools for Visual Studio
Shahrokh Mortazavi, Dino Viehland, Wenming Ye, Dennis Gannon.
Microsoft Azure: Notebooks in the Cloud
Google CoLaboratory
Kayur Patel, Kester Tong, Mark Sanders, Corinna Cortes @ Google
Matt Turk @ NCSA/UIUC
IBM Watson
SCIENCE
JupyterHub: multiuser support
❖ Out of the box
❖ Unix accounts
❖ Local single-user notebooks
❖ Customizable
❖ Authentication: OAuth, LDAP, etc.
❖ Subprocess control: Docker, VMs, etc.
JupyterHub in Education @ Berkeley
https://developer.rackspace.com/blog/deploying-jupyterhub-for-education
❖ Computationally intensive course, ~220 students
❖ Fully hosted environment, zero-install
❖ Homework management and grading (w B. Granger)
Jess Hamrick @ Cal
K. Kelley
Rackspace
M. Ragan-Kelley
Cal
B. Granger
Cal Poly
COLLABORATION
Why?
A ten year journey.
Optimism and hope for the future.
IMAGINE THE POSSIBILITIES
TRY.JUPYTER.ORG
WE’RE OPEN FOR YOU.
THANK YOU
try.jupyter.org
www.jupyter.org
numfocus.org ipython.org
CREDITS AND ATTRIBUTION
➤ Sources
➤ Jupyter website www.jupyter.org [11, 31, 65, 66, 69]
➤ Fernando Pérez [12, 28, 29, 33-40, 48-52, 53-55] http://fperez.org/ BIDS http://bids.berkeley.edu/
➤ Cal Poly and UC Berkeley Press Releases http://calpolynews.calpoly.edu/news_releases/2015/July/jupyter.html, http://bids.berkeley.edu/news/
project-jupyter-gets-6m-expand-collaborative-data-science-software [14-19]
➤ Jupyterhub at NERSC and OpenMSI, S. Cholla and O. Ruebel, Jupyterhub Day presentation, July 17, 2015 [42, 43]
➤ music21 website http://web.mit.edu/music21/ [45]
➤ Jeremy Freeman http://jeremyfreeman.net/ PyData Talk NYC Winter 2015 https://github.com/freeman-lab/talk-nyc-winter-2015 [56, 57, 58]
➤ CodeNeuro website http://codeneuro.org/ [59-60]
➤ Binder website http://mybinder.org/ [61]
➤ Images
➤ [2, 10, 21, 27, 30, 62, 64] Galaxy
➤ [23] Hummingbird https://flic.kr/p/mo5pa1
➤ [25] Netflix Prize Christopher Hefele https://flic.kr/p/6LWT6K
➤ [3-7, 8 (artwork FabLab interns), 9, 20, 22, 24, 26, 42, 43, 46, 57, 63] Carol Willing. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
➤ For additional information
➤ Jupyter www.jupyter.org
➤ Python Software Foundation www.python.org
➤ Carol Willing, willingc@willingconsulting.com, @willingcarol, GitHub: willingc

JupyterHub for Interactive Data Science Collaboration

  • 1.
  • 2.
  • 3.
    CAROL WILLING ➤ PythonSoftware Foundation, Director ➤ Project Jupyter, Contributor ➤ Fab Lab San Diego, Geek in Residence
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    The Notebook: “LiterateComputing” Computational Narratives ❖ Computers deal with code and data. ❖ Humans deal with narratives that communicate. Literate Computing (not Literate Programming) narratives anchored in a live computation, that communicate a story based on data and results. Cf: Mathematica, Maple, MuPad, Sage…
  • 14.
    “Project Jupyter servesnot only the academic and scientific communities but also a much broader constituency of data scientists in research, education, industry and journalism… - Fernando Pérez UC Berkeley
  • 15.
    “…we see usesof our tools that range from high school education in programming to the nation’s supercomputing facilities and the leaders of the tech industry. - Fernando Pérez UC Berkeley
  • 16.
    “More than amillion people are currently using Jupyter for everything from… -Prof. Brian Granger Cal Poly
  • 17.
    “…analyzing massive genesequencing datasets to processing images from the Hubble Space Telescope and developing models of financial markets. -Prof. Brian Granger Cal Poly
  • 18.
    “We are excitedby the potential of Project Jupyter to reach even wider audiences and to contribute to increased cross-disciplinary collaboration in the sciences. -Betsy Fader Helmsley Charitable Trust
  • 19.
    “Jupyter Notebook… willenable data exploration, visualization, and analysis in a way that encourages sound science and speeds progress. -Chris Mentzel The Gordon and Betty Moore Foundation
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 27.
  • 28.
    –Hamming'62 “The purpose ofcomputing is insight, not numbers”
  • 29.
    The Lifecycle ofa Scientific Idea (schematically) 1. Individual exploratory work 2. Collaborative development 3. Parallel production runs (HPC, cloud, ...) 4. Publication & communication (reproducibly!) 5. Education 6. Goto 1.
  • 30.
  • 32.
  • 33.
    nbviewer: seamless notebooksharing ❖ Zero-install reading of notebooks ❖ Just share a URL ❖ nbviewer.ipython.org
  • 34.
    Executable books ❖ Springerhardcover book ❖ Chapters: IPython Notebooks ❖ Posted as a blog entry ❖ All available as a Github repo Python for Signal Processing, by José Unpingco
  • 35.
    University Courses These arejust some we are aware of!
  • 36.
    A collaborative MOOCon OpenEdX http://lorenabarba.com/news/announcing-practical-numerical-methods-with-python-mooc ❖ Lorena Barba at George Washington University, USA. ❖ Ian Hawke at Southampton, UK ❖ Carlos Jerez at Pontifical Catholic University of Chile. ❖ All materials on Gihtub.
  • 37.
    Changing the scientificculture http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261
  • 38.
    Executable papers: thefuture? http://www.nature.com/news/ipython-interactive-demo-7.21492?article=1.16261
  • 39.
    Notebook Workflows: TheBig Picture Image credit: Joshua Barratt
  • 40.
    Lots more! TheIPython Gallery https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks
  • 41.
  • 42.
    Shreyas Cholia &! Oliver Ruebel! NERSC Data & Analytics Services Group! Jupyterhub Day, July 17 2015 Jupyterhub at NERSC and OpenMSI
  • 43.
    NERSC is theProduction HPC & Data Facility for DOE Office of Science Research Bio$Energy,$$Environment$ Compu2ng$ Materials,$Chemistry,$$ Geophysics$ Par2cle$Physics,$ Astrophysics$ Largest$funder$of$physical$ science$research$in$U.S.$$ Nuclear$Physics$ Fusion$Energy,$ Plasma$Physics$ D$2$D$
  • 44.
  • 47.
  • 48.
    Quantopian: algorithmic trading KarenRubin Dir. Product Management at Quantopian Quantopian Research Post Fortune.com
  • 49.
    Microsoft: Python Toolsfor Visual Studio Shahrokh Mortazavi, Dino Viehland, Wenming Ye, Dennis Gannon.
  • 50.
  • 51.
    Google CoLaboratory Kayur Patel,Kester Tong, Mark Sanders, Corinna Cortes @ Google Matt Turk @ NCSA/UIUC
  • 52.
  • 53.
  • 54.
    JupyterHub: multiuser support ❖Out of the box ❖ Unix accounts ❖ Local single-user notebooks ❖ Customizable ❖ Authentication: OAuth, LDAP, etc. ❖ Subprocess control: Docker, VMs, etc.
  • 55.
    JupyterHub in Education@ Berkeley https://developer.rackspace.com/blog/deploying-jupyterhub-for-education ❖ Computationally intensive course, ~220 students ❖ Fully hosted environment, zero-install ❖ Homework management and grading (w B. Granger) Jess Hamrick @ Cal K. Kelley Rackspace M. Ragan-Kelley Cal B. Granger Cal Poly
  • 62.
  • 63.
    A ten yearjourney. Optimism and hope for the future.
  • 64.
  • 65.
  • 66.
  • 67.
  • 69.
    CREDITS AND ATTRIBUTION ➤Sources ➤ Jupyter website www.jupyter.org [11, 31, 65, 66, 69] ➤ Fernando Pérez [12, 28, 29, 33-40, 48-52, 53-55] http://fperez.org/ BIDS http://bids.berkeley.edu/ ➤ Cal Poly and UC Berkeley Press Releases http://calpolynews.calpoly.edu/news_releases/2015/July/jupyter.html, http://bids.berkeley.edu/news/ project-jupyter-gets-6m-expand-collaborative-data-science-software [14-19] ➤ Jupyterhub at NERSC and OpenMSI, S. Cholla and O. Ruebel, Jupyterhub Day presentation, July 17, 2015 [42, 43] ➤ music21 website http://web.mit.edu/music21/ [45] ➤ Jeremy Freeman http://jeremyfreeman.net/ PyData Talk NYC Winter 2015 https://github.com/freeman-lab/talk-nyc-winter-2015 [56, 57, 58] ➤ CodeNeuro website http://codeneuro.org/ [59-60] ➤ Binder website http://mybinder.org/ [61] ➤ Images ➤ [2, 10, 21, 27, 30, 62, 64] Galaxy ➤ [23] Hummingbird https://flic.kr/p/mo5pa1 ➤ [25] Netflix Prize Christopher Hefele https://flic.kr/p/6LWT6K ➤ [3-7, 8 (artwork FabLab interns), 9, 20, 22, 24, 26, 42, 43, 46, 57, 63] Carol Willing. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. ➤ For additional information ➤ Jupyter www.jupyter.org ➤ Python Software Foundation www.python.org ➤ Carol Willing, willingc@willingconsulting.com, @willingcarol, GitHub: willingc