Australian Ecosystems Science Cloud


TERN's Siddeswara Guru presents on the Australian Ecosystem Science Cloud, which will provide the ecosystem science community improved access to shared data, tools, platforms and computing resources.

  1. 1. TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy. Australian Ecosystems Science Cloud overview Presentation by Siddeswara Guru Director, Data Science
  2. 2. Ecosystem science • Inter-relationship among the living organisms, physical features, bio-chemical processes, natural phenomena, and human activities in ecological communtiies1 • Focusing on Terrestrial Ecosystem – Terrestrial Ecosystem Research Network – Atlas of Living Australia • Data is heterogeneous: wide variety from different domain – Observation (human, in-situ sensors and satellite remote sensing) – Variety of scale: spatial and temporal – Different data formats used in the community
  3. 3. Data Use • Conventional data access – Need to find data – Access via services – copy from source to destination for further for large datasets Image from internet
  4. 4. Storage and Compute • Advent of NeCTAR and RDS – Researchers are moving data and computation to cloud. – Building tools (Virtual labs, research tools and platforms) – However, easy accessibility of data is still an issue • Multiple interfaces to search for data • No clear access mechanism from different nodes
  5. 5. Goal • Offer open data platform: harmonised cloud-enabled data infrastructure for data interoperability with simplified service model • Offer compute next to data to minimise data movement • Data accessibility to different research platforms and virtual labs from common platform • Offer scalable managed computing environment with access to distributed and data-intensive computation technologies • develop a support system for a cross-discipline use of data
  6. 6. User Stories • As an ecosystem science continental-scale gridded data user, I wants to query a dataset, perform spatial and temporal sub-setting of data, access and use that data from a cloud platform as a local file so that I can work on further analyses. • As an application developer, I need enough compute and storage for short period of time to run a distributed large-scale data intensive application so that the output of the analyses are available in decent amount of time. • As a regular ecology data user, I need a easily accessible cloud compute platform with common tools (Rstudio, Jupyter Python, NetCDF viewer, spatial data viewer, CSV file viewer) attached with the TERN ecology and biophysical data collection so that I can build applications for analysis and synthesis. • As a data intensive application developer, I need a flexible approach to create and access to Hadoop cluster so that I can distribute my computation. • As a data user, I want an easy access to reference datasets with compute resources so that I can use them in my analysis and research work. • As a ecosystem data user, I want a one stop-shop to search, query and access ecosystem data and use in my analysis so that I don't have to go through multiple portals to access and use data. • As an application developer, I want a cloud platform to run my simulation with a local access to data so that I don't move data around or download into my desktop.
  7. 7. High-level conceptual Architecture
  8. 8. Current status • Setup a Technical Advisory Group advice on the scoping and implementation of the project. • In the first iteration: reference datasets will be made available – Remote sensing reference data (fractional Cover) – Long-term ecological monitoring data – Climate variables • Scoping the mediation layer and overall architecture • Building a coalition of willing for partnership and collaboration
  9. 9. Contributions • NeCTAR – Major project sponsor • TERN, ALA – NCRIS Domain Projects, partners • QCIF - implementation partner • NCI – collaborator, partners
  10. 10. contact Siddeswara Guru: Hamish Holewa :