TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy.
Australian Ecosystems
Science Cloud
overview
Presentation by Siddeswara Guru
Director, Data Science
Ecosystem science
• Inter-relationship among the living organisms, physical features, bio-chemical
processes, natural phenomena, and human activities in ecological communtiies1
• Focusing on Terrestrial Ecosystem
– Terrestrial Ecosystem Research Network
– Atlas of Living Australia
• Data is heterogeneous: wide variety from different domain
– Observation (human, in-situ sensors and satellite remote sensing)
– Variety of scale: spatial and temporal
– Different data formats used in the community
Data Use
• Conventional data access
– Need to find data
– Access via services
– copy from source to destination for further for
large datasets
Image from internet
Storage and Compute
• Advent of NeCTAR and RDS
– Researchers are moving data and computation to
cloud.
– Building tools (Virtual labs, research tools and
platforms)
– However, easy accessibility of data is still an issue
• Multiple interfaces to search for data
• No clear access mechanism from different nodes
Goal
• Offer open data platform: harmonised cloud-enabled data
infrastructure for data interoperability with simplified service
model
• Offer compute next to data to minimise data movement
• Data accessibility to different research platforms and virtual
labs from common platform
• Offer scalable managed computing environment with access
to distributed and data-intensive computation technologies
• develop a support system for a cross-discipline use of data
User Stories
• As an ecosystem science continental-scale gridded data user, I wants to query a dataset, perform
spatial and temporal sub-setting of data, access and use that data from a cloud platform as a local
file so that I can work on further analyses.
• As an application developer, I need enough compute and storage for short period of time to run a
distributed large-scale data intensive application so that the output of the analyses are available in
decent amount of time.
• As a regular ecology data user, I need a easily accessible cloud compute platform with common
tools (Rstudio, Jupyter Python, NetCDF viewer, spatial data viewer, CSV file viewer) attached with
the TERN ecology and biophysical data collection so that I can build applications for analysis and
synthesis.
• As a data intensive application developer, I need a flexible approach to create and access to Hadoop
cluster so that I can distribute my computation.
• As a data user, I want an easy access to reference datasets with compute resources so that I can use
them in my analysis and research work.
• As a ecosystem data user, I want a one stop-shop to search, query and access ecosystem data and
use in my analysis so that I don't have to go through multiple portals to access and use data.
• As an application developer, I want a cloud platform to run my simulation with a local access to data
so that I don't move data around or download into my desktop.
High-level conceptual Architecture
Current status
• Setup a Technical Advisory Group advice on the scoping and
implementation of the project.
• In the first iteration: reference datasets will be made available
– Remote sensing reference data (fractional Cover)
– Long-term ecological monitoring data
– Climate variables
• Scoping the mediation layer and overall architecture
• Building a coalition of willing for partnership and collaboration
Contributions
• NeCTAR – Major project sponsor
• TERN, ALA – NCRIS Domain Projects, partners
• QCIF - implementation partner
• NCI – collaborator, partners
contact
Siddeswara Guru: s.guru@uq.edu.au
Hamish Holewa : hholewa@quadrant.edu.au

Australian Ecosystems Science Cloud

  • 1.
    TERN is supportedby the Australian Government through the National Collaborative Research Infrastructure Strategy. Australian Ecosystems Science Cloud overview Presentation by Siddeswara Guru Director, Data Science
  • 2.
    Ecosystem science • Inter-relationshipamong the living organisms, physical features, bio-chemical processes, natural phenomena, and human activities in ecological communtiies1 • Focusing on Terrestrial Ecosystem – Terrestrial Ecosystem Research Network – Atlas of Living Australia • Data is heterogeneous: wide variety from different domain – Observation (human, in-situ sensors and satellite remote sensing) – Variety of scale: spatial and temporal – Different data formats used in the community
  • 3.
    Data Use • Conventionaldata access – Need to find data – Access via services – copy from source to destination for further for large datasets Image from internet
  • 4.
    Storage and Compute •Advent of NeCTAR and RDS – Researchers are moving data and computation to cloud. – Building tools (Virtual labs, research tools and platforms) – However, easy accessibility of data is still an issue • Multiple interfaces to search for data • No clear access mechanism from different nodes
  • 5.
    Goal • Offer opendata platform: harmonised cloud-enabled data infrastructure for data interoperability with simplified service model • Offer compute next to data to minimise data movement • Data accessibility to different research platforms and virtual labs from common platform • Offer scalable managed computing environment with access to distributed and data-intensive computation technologies • develop a support system for a cross-discipline use of data
  • 6.
    User Stories • Asan ecosystem science continental-scale gridded data user, I wants to query a dataset, perform spatial and temporal sub-setting of data, access and use that data from a cloud platform as a local file so that I can work on further analyses. • As an application developer, I need enough compute and storage for short period of time to run a distributed large-scale data intensive application so that the output of the analyses are available in decent amount of time. • As a regular ecology data user, I need a easily accessible cloud compute platform with common tools (Rstudio, Jupyter Python, NetCDF viewer, spatial data viewer, CSV file viewer) attached with the TERN ecology and biophysical data collection so that I can build applications for analysis and synthesis. • As a data intensive application developer, I need a flexible approach to create and access to Hadoop cluster so that I can distribute my computation. • As a data user, I want an easy access to reference datasets with compute resources so that I can use them in my analysis and research work. • As a ecosystem data user, I want a one stop-shop to search, query and access ecosystem data and use in my analysis so that I don't have to go through multiple portals to access and use data. • As an application developer, I want a cloud platform to run my simulation with a local access to data so that I don't move data around or download into my desktop.
  • 7.
  • 8.
    Current status • Setupa Technical Advisory Group advice on the scoping and implementation of the project. • In the first iteration: reference datasets will be made available – Remote sensing reference data (fractional Cover) – Long-term ecological monitoring data – Climate variables • Scoping the mediation layer and overall architecture • Building a coalition of willing for partnership and collaboration
  • 9.
    Contributions • NeCTAR –Major project sponsor • TERN, ALA – NCRIS Domain Projects, partners • QCIF - implementation partner • NCI – collaborator, partners
  • 10.
    contact Siddeswara Guru: s.guru@uq.edu.au HamishHolewa : hholewa@quadrant.edu.au