TERN is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy.
Presentation by Siddeswara Guru
Director, Data Science
• Inter-relationship among the living organisms, physical features, bio-chemical
processes, natural phenomena, and human activities in ecological communtiies1
• Focusing on Terrestrial Ecosystem
– Terrestrial Ecosystem Research Network
– Atlas of Living Australia
• Data is heterogeneous: wide variety from different domain
– Observation (human, in-situ sensors and satellite remote sensing)
– Variety of scale: spatial and temporal
– Different data formats used in the community
• Conventional data access
– Need to find data
– Access via services
– copy from source to destination for further for
Image from internet
Storage and Compute
• Advent of NeCTAR and RDS
– Researchers are moving data and computation to
– Building tools (Virtual labs, research tools and
– However, easy accessibility of data is still an issue
• Multiple interfaces to search for data
• No clear access mechanism from different nodes
• Offer open data platform: harmonised cloud-enabled data
infrastructure for data interoperability with simplified service
• Offer compute next to data to minimise data movement
• Data accessibility to different research platforms and virtual
labs from common platform
• Offer scalable managed computing environment with access
to distributed and data-intensive computation technologies
• develop a support system for a cross-discipline use of data
• As an ecosystem science continental-scale gridded data user, I wants to query a dataset, perform
spatial and temporal sub-setting of data, access and use that data from a cloud platform as a local
file so that I can work on further analyses.
• As an application developer, I need enough compute and storage for short period of time to run a
distributed large-scale data intensive application so that the output of the analyses are available in
decent amount of time.
• As a regular ecology data user, I need a easily accessible cloud compute platform with common
tools (Rstudio, Jupyter Python, NetCDF viewer, spatial data viewer, CSV file viewer) attached with
the TERN ecology and biophysical data collection so that I can build applications for analysis and
• As a data intensive application developer, I need a flexible approach to create and access to Hadoop
cluster so that I can distribute my computation.
• As a data user, I want an easy access to reference datasets with compute resources so that I can use
them in my analysis and research work.
• As a ecosystem data user, I want a one stop-shop to search, query and access ecosystem data and
use in my analysis so that I don't have to go through multiple portals to access and use data.
• As an application developer, I want a cloud platform to run my simulation with a local access to data
so that I don't move data around or download into my desktop.
• Setup a Technical Advisory Group advice on the scoping and
implementation of the project.
• In the first iteration: reference datasets will be made available
– Remote sensing reference data (fractional Cover)
– Long-term ecological monitoring data
– Climate variables
• Scoping the mediation layer and overall architecture
• Building a coalition of willing for partnership and collaboration