We review experiences with the deployment of a cloud-hosted IPython Notebook service to serve as a collaborative platform for earth observation (EO) data analysis and processing.
OPTIRAD (OPTImisation environment for joint retrieval of multi-sensor RADiances) is an ESA funded project addressing the challenge of producing consistent EO land surface information products from heterogeneous EO data inputs. The project poses a number of challenges from an infrastructure provisioning perspective: First, the need was identified to provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users. Secondly any hosting platform needs sufficient compute memory and storage capacity to support processing at high spatial and temporal resolutions with computationally expensive algorithms. Finally, the system would need to support the execution and development of existing Python code and the provision of interactive tutorials for new users. To this end, a solution has been developed based on the IPython Notebook hosted on the private cloud provided by the JASMIN / CEMS data analysis facility at STFC Rutherford Appleton Laboratory in the UK.
The IPython Notebook has been gaining traction in recent years as a collaborative tool for scientific computing and data analysis. It provides an interactive Python shell hosted in an intuitive user-friendly interface together with the ability to save and share sessions. As a web-based application it is readily amenable for hosting on a cloud, enabling the scaling of resources - especially in this context in terms of the compute capability and memory at the disposal of each user. JASMIN/CEMS uses IPython’s JupyterHub to provide multi-user support and each user session has access to IPython.parallel which effectively wraps parallel compute capability behind a simple Python interface. This platform therefore provides a customisable training and processing environment with compute resources beyond the scale available to desktop users.
Further work is underway to enhance the existing system to broaden and extend its capabilities. The JASMIN/CEMS deployment is being trialled to run in Docker containers building on recent work done by the IPython community. This will facilitate greater portability between cloud providers. Combined with systems for provenance capture, the user of containers can contribute towards replicable science with any given algorithm annotated with provenance metadata and its runtime environment effectively encapsulated within a given container.