Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data analytics in the cloud with Jupyter notebooks.


Published on

Jupyter Notebooks provide an interactive computational environment, in which you can combine Python code, rich text, mathematics, plots and rich media. It provides a convenient way for data analysts to explore, capture and share their research.

Numerous options exist for working with Jupyter Notebooks, including running a Jupyter Notebook instance locally or by using a Jupyter Notebook hosting service.

This talk will provide a quick tour of some of the more well known options available for running Jupyter Notebooks. It will then look at custom options for hosting Jupyter Notebooks yourself using public or private cloud infrastructure.

An in-depth look at how you can run Jupyter Notebooks in OpenShift will be presented. This will cover how you can directly deploy a Jupyter Notebook server image, as well as how you can use Source-to-Image (S2I) to create a custom application for your requirements by combining an existing Jupyter Notebook server image with your own notebooks, additional code and research data.

Specific use cases around Jupyter Notebooks which will be explored will include individual use, team use within an organisation, and class room environments for teaching. Other issues which will be covered include importing of notebooks and data into an environment, storing data using persistent volumes and other forms of centralised storage.

As an example of the possibilities of using Jupyter Notebooks with a cloud, it will be shown how you can easily use OpenShift to set up a distributed parallel computing cluster using ‘ipyparallel’ and use it in conjunction with a Jupyter Notebook.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website!
    Are you sure you want to  Yes  No
    Your message goes here

Data analytics in the cloud with Jupyter notebooks.

  1. 1. Data analytics in the cloud with Jupyter Notebooks Graham Dumpleton
  2. 2.
  3. 3. Python Data Science Handbook / 04.12-Three-Dimensional-Plotting
  4. 4. Python Data Science Handbook / 04.13-Geographic-Data-With-Basemap
  5. 5.
  6. 6. Who’s Using It? Individuals Collaborators Teachers
  7. 7. Getting Started pip3 install jupyter jupyter notebook
  8. 8. Empty Workspace
  9. 9. Upload Notebooks
  10. 10. Local File System $ ls notebooks/01*.ipynb notebooks/01.00-IPython-Beyond-Normal-Python.ipynb notebooks/01.01-Help-And-Documentation.ipynb notebooks/01.02-Shell-Keyboard-Shortcuts.ipynb notebooks/01.03-Magic-Commands.ipynb notebooks/01.04-Input-Output-History.ipynb notebooks/01.05-IPython-And-Shell-Commands.ipynb notebooks/01.06-Errors-and-Debugging.ipynb notebooks/01.07-Timing-and-Profiling.ipynb notebooks/01.08-More-IPython-Resources.ipynb
  11. 11. Browsing Files
  12. 12. Interacting with a Notebook
  13. 13. Status of Notebooks
  14. 14. Installing Packages
  15. 15. Positives • Save notebooks/data locally. • Python virtual environments. • Select Python version you want. • Install required Python packages.
  16. 16. Negatives • Operating system differences. • Python distribution differences. • Python version differences. • Package index differences. • PyPi (pip) vs Anaconda (conda) • Effort to setup and maintain.
  17. 17. Docker Images
  18. 18. Running Docker Image docker run -it --rm -p 8888:8888 jupyter/minimal-notebook
  19. 19. Positives • Pre-created images. • Bundled operating system packages. • Known Python distribution/vendor. • Bundled Python packages. • Docker images are read only. • Don’t need to maintain the image.
  20. 20. Negatives (1) • More effort to customise experience. • Build a custom Docker image to extend. • Install extra packages each time you run it. • Images can be very large. • Multiple Python versions. • Packages that you do not need.
  21. 21. Negatives (2) • Access to and saving your notebooks/data. • Need to mount persistent storage volumes. • Ensuring access is done securely.
  22. 22.
  23. 23. Azure Notebooks
  24. 24. Binder Service
  25. 25. Positives • Somebody else looks after everything.
  26. 26. Negatives • Shared resource. • Outside of your control. • Reliability. • Customisation. • Software versions. • Information security.
  27. 27. JupyterHub
  28. 28. Positives • Can customise however you want. • Modify code for service. • Use custom images.
  29. 29. Negatives • Dedicated infrastructure. • Effort to understand and set it up. • Effort to keep it running.
  30. 30. Many Options to Choose From
  31. 31. OpenShift
  32. 32. Deployments
  33. 33. Docker Image
  34. 34. Image Stream
  35. 35. Notebook Storage
  36. 36. Attaching Storage
  37. 37. Shared Storage
  38. 38. Positives • Use existing features of OpenShift • No special storage backends required. • No custom provisioning applications. • Cluster can still be used for other applications. • Simply set quotas and users do what they want.
  39. 39. Source-to-Image
  40. 40. Positives • Easily build custom images. • Pre-populated with required Python packages. • Pre-populated with required Jupyter Notebooks. • Pre-populated with required data files. • Direct to application, or to create images.
  41. 41. Service Catalog
  42. 42. Templates (builder)
  43. 43. Templates (cluster)
  44. 44. Templates (notebook)
  45. 45. IPyParallel Cluster
  46. 46. Parallel Computing
  47. 47. Positives • Templates enable complex deployments. • Don’t need something like JupyterHub.
  48. 48. Challenges • Custom base images and builders. • Learning curve for writing templates.
  49. 49. Command Line oc new-app stats101-notebook-template --param STUDENT_NUMBER=1 --param CLASS_NUMBER=1234 oc new-app stats101-notebook-template --param STUDENT_NUMBER=2 --param CLASS_NUMBER=1234 … oc delete all --selector class=1234
  50. 50. REST API import powershift.endpoints as endpoints client = endpoints.Client() projects = client.oapi.v1.projects.get() def public_address(route): host = path = route.spec.path or '/' if route.spec.tls: return 'https://%s%s' % (host, path) return 'http://%s%s' % (host, path) routes = client.oapi.v1.namespaces(namespace='stats101').routes.get() for route in routes.items: print(' route=%r' % public_address(route))
  51. 51. Positives • Easily trigger multiple deployments using CLI. • REST API also available for custom front ends.
  52. 52. Resources • S2I enabled Jupyter Notebook images • • OpenShift versions of Jupyter Project images • • Python REST API client for OpenShift •