Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HDF Kita Lab: JupyterLab + HDF Service


Published on

HDF and HDF-EOS Workshop XXI (2018)

Published in: Software
  • Be the first to comment

  • Be the first to like this

HDF Kita Lab: JupyterLab + HDF Service

  1. 1. John Readey 2018 ESIP Summer Meeting HDF Kita Lab: JupyterLab + HDF Service 1
  2. 2. • The HDF Group is providing a hosted JupyterLab environment at • Open to anyone (you just need to register with The HDF Group) • Provides access to HDF Kita Server (aka HSDS) – HDF data on S3 • Comes with sample notebooks, tutorials, datasets • There is a small subscription fee ($10/month) • ESIP attendees get a free 90 day trial 2HDF Kita Lab HDF Kita Lab Kubernetes
  3. 3. • HDF Kita Lab is based on JupyterLab • JupyterLab is the next-generation web-based interface for running Python notebooks • Extends classic Ipython environment with: • Content browser for documents • Upload/downloading of files • Terminal App • HDF Kita Lab Extends JupyterLab: • Auto configures Kita Server • FAQ Page on launcher • HDF branding 3JupyterLab
  4. 4. • No messing with Python, package installs, AWS, etc. • Data ready for you • Simple means to harness compute cluster 4Simplify your life…
  5. 5. • HDF Kita Lab runs on AWS in a Kubernetes cluster • Cluster can scale to handle different number of users • Each user gets: • 1 CPU Core (2.5GHz Xeon) • 8 GB RAM • 10 GB Disk • 100 GB S3 Storage • Access to HDF Kita Server • Ability to read/write HDF data stored on S3 • User environment configured for commonly used Python Packages for HDF users: • H5py(d), pandas, h5netcdf, xarray, bokeh, dask • HDF Kita Command Line tools: • Hsinfo, hsls, hsget, hsload, etc. 5Features
  6. 6. • JupyterLab and Kita Server both runs as a set of Docker containers • Kubernetes transparently manages running these containers across multiple machines 6Kubernetes Platform AWS Kubernetes JupyterHub HDF Kita Server (HSDS) {Containers
  7. 7. 7Architecture AWS S3 Kita Server (HSDS) User SN SN SN SN DN DN DN DN User Containers & EBS Volumes spawn
  8. 8. • The S3 bucket used for storing HDF data provides unlimited capacity • Cost effective ($0.02/GB/month vs $0.10/GB/month for EBS) • Built in redundancy – so no danger of losing data via a disk crash • Kita Server is a turbo-booster for accessing data on S3 • Requests are parallelized • RAM cache • Read/Write consistency • Multi-tennant Access control • ACLs for Folders & Files 8HDF Data on S3
  9. 9. • Each EBS Volume is an island… • You can’t directly share your EBS data with others in JupyterLab • HDF content in S3 can be shared with any Kita Lab user • For each folder or file you can: • Make it private (no one else can read or write) • Make publically readable (anyone can read) • Share with just who you want • Use the hsacl tool to manage permissions • We’ve seeded the /shared folder will some content to play with: • NASA NCEP3 dataset (100GB) • NASA Terra dataset (50GB) • Daily Stock Market (150MB) • More coming! 9Data Sharing
  10. 10. • Additional samples/data sets/tutorials • Custom extensions • File browser for Kita Server content • HDF Viewer • Bring in other JupyterLab extensions as they become stable • Collaboration tools • Github integratation 10Future Directions