SPEDDEXES: An open-source, community developed
approach to enhancing the way ‘Big Data’ is managed,
discovered and shared ...
SPatially Explicit Data Discovery,
EXtraction and Evaluation Service
SPATIALLY and temporally EXPLICIT research data
infra...
: Ever growing need
The Spatially Explicit Data Discovery, Extraction and Evaluation
Service (SPEDDEXES) was developed to ...
10
100
1000
10000 5419
1928
326
176 140
DataVolumes(TB)
Scientific Data for Research (NCI RDSI node)
by 2015
: New approach for Big Data
It is no longer practical, let alone affordable, to
continue to do data-intensive ecosystem sc...
: Two key issues
The SPEDDEXES concept and tools addresses
two key issues.
• Firstly, create a self-describing data archiv...
: SPEDDEXES architecture
Connecting data to applications through the use of open-source
middleware services and web techno...
: Seeking climatic data
ERDDAP Service
- Catalogue
- File (csv,…)
- RSS notify
- Rich user interface
NCAR Data Service
- C...
THREDDS and Discovery Systems
Data server
Communicate with
Discovery Systems
Metadata
Repository
Metadata
HarvesterReads
R...
SCO-R Project
overview
Trans-disciplinary science
• To publish, catalogue and access self-documented data for
enhancing trans-disciplinary, big e...
For further information:
Brad Evans
Director ~ TERN e-MAST
bradley.evans@mq.edu.au
Tim Pugh
Australian Bureau of Meteorolo...
Self-describing data
An open-source GeoSciences file format is the network Common Data
Format (netCDF) from Unidata (http:...
Fundamental Objective of OPENDAP
The fundamental objective of OPeNDAP and OPeNDAP Inc. is to
facilitate internet access to...
THREDDS Data Server (TDS)
TDS is THREDDS Data Server
• THREDDS is Thematic Real-time Environmental Distributed Data Servic...
Spectrum of Use Cases
Application Data
Representation
OGC data model
domain specific
geospatial, 1-D, 2-D
DAP2 data model
...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDEXES). Tim Pugh, ACEAS Grand 2014
Upcoming SlideShare
Loading in …5
×

SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDEXES). Tim Pugh, ACEAS Grand 2014

421 views

Published on

SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDEXES), Tim Pugh Bureau of Meteorology for ACEAS Grand 2014

Published in: Environment, Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
421
On SlideShare
0
From Embeds
0
Number of Embeds
97
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDEXES). Tim Pugh, ACEAS Grand 2014

  1. 1. SPEDDEXES: An open-source, community developed approach to enhancing the way ‘Big Data’ is managed, discovered and shared by ecosystem scientists Evans, Bradley John*; Guru, Siddeswara; Allen, Stuart; Beckett, Duan; de Wit, Roald; Duursma, Daisy; Erwin, Tim; Evans, Ben; Fuchs, David; Hodge, Jonathan; Ip, Alex; King, Edward; Lewis, Adam; Paget, Matthew; Porter, David; Prentice, Iain Colin; Pugh, Tim; Scarth, Peter; Sixsmith, Joshua; Sun. Yi; Trevithick, Rebecca; Whitley, Rhys SPatially Explicit Data Discovery, EXtraction and Evaluation Service
  2. 2. SPatially Explicit Data Discovery, EXtraction and Evaluation Service SPATIALLY and temporally EXPLICIT research data infrastructure to interrogate data streams on the National Computing Infrastructure DATA DISCOVERY for TERN or any datasets which can be read by the platform EXTRACTION AND EVALUATION drives advances in ecosystem science, impact assessment and land management Community success story: convolution of ideas from ... Government (Fed and State), CSIRO, NCI, INTERSECT and Universities
  3. 3. : Ever growing need The Spatially Explicit Data Discovery, Extraction and Evaluation Service (SPEDDEXES) was developed to address the ever growing need to better manage and access the Big Data available to Australian ecosystem sciences today. Coupled Model Intercomparison Project for Climate Experiments • CMIP-5 consists of ~23 international models • CMIP-5 international data repository is >2PB • CMIP-6 contributions expected to 10x CMIP-5
  4. 4. 10 100 1000 10000 5419 1928 326 176 140 DataVolumes(TB) Scientific Data for Research (NCI RDSI node) by 2015
  5. 5. : New approach for Big Data It is no longer practical, let alone affordable, to continue to do data-intensive ecosystem science in the copy-and-work paradigm, a new approach to working with Big Data is required. Think about network data access, not file downloads … Cross-disciplinary use of file formats and services … Open-source server technology and file formats … Work with big data in a high performance facility
  6. 6. : Two key issues The SPEDDEXES concept and tools addresses two key issues. • Firstly, create a self-describing data archive, which adheres to international standards and community conventions. • Secondly, data providers to adopt community standards to enable data catalogue and data access services for easier utility, management, and sustainability.
  7. 7. : SPEDDEXES architecture Connecting data to applications through the use of open-source middleware services and web technologies 1. an Open-source Project for a Network Data Access Protocol (OPeNDAP) 2. the Open Geospatial Consortium (OGC) web services and the Web Map Service (WMS) and protocol 3. the Thematic Real-time Environmental Distributed Data Services (TDS) service, an implementation of OPeNDAP and WMS 4. an Environmental Research Divisions Data Access Program (ERDDAP) service to aggregate data sources and provide search and data download services 5. ZOO Web Processing Service (ZOO WPS) for server-side processing 6. A javascript web interface with search and visualization and subset download functionalities (a.k.a. SPEDDEXES-UI).
  8. 8. : Seeking climatic data ERDDAP Service - Catalogue - File (csv,…) - RSS notify - Rich user interface NCAR Data Service - Catalogue - OPeNDAP - WMS NCI Data Service - Catalogue - OPeNDAP - WMS TERN Data Service - Catalogue - OPeNDAP - WMS
  9. 9. THREDDS and Discovery Systems Data server Communicate with Discovery Systems Metadata Repository Metadata HarvesterReads References Discovery System THREDDS Services with data server Writes Catalog Searches Metadata Generator Netcdf, hdf, grib …
  10. 10. SCO-R Project overview
  11. 11. Trans-disciplinary science • To publish, catalogue and access self-documented data for enhancing trans-disciplinary, big ecosystem-data science within interoperable data services and protocols. Integrity of Science • Ease of access to data to enhance the scientist’s workflow, ensures more accurate and repeatable science which can be conducted with less effort. Integrity of Data • The data repository services ensure data integrity, digital object identifiers, data discovery and catalogue searches.
  12. 12. For further information: Brad Evans Director ~ TERN e-MAST bradley.evans@mq.edu.au Tim Pugh Australian Bureau of Meteorology Centre for Australian Weather and Climate Research t.pugh@bom.gov.au
  13. 13. Self-describing data An open-source GeoSciences file format is the network Common Data Format (netCDF) from Unidata (http://www.unidata.ucar.edu). NetCDF goals support for data archives: • Portable: byte order neutral. • Efficient: random access • Appendable data arrays • Metadata within the file for global and variable attributes Metadata conventions provide community standards for … • self-describing (CF) metadata conventions • data discovery (Unidata ACDD) conventions • community specific metadata (i.e. IMOS, TERN AusCover) • http://www.auscover.org.au/userdocs/metadata
  14. 14. Fundamental Objective of OPENDAP The fundamental objective of OPeNDAP and OPeNDAP Inc. is to facilitate internet access to scientific data This is done by: • Providing a protocol (DAP) to access data over the internet, • Hiding the format (and organization) in which the data are stored from the user, and • Providing subsetting (and other) capabilities for the data at the server OPeNDAP is based on a multi-tier architecture OPeNDAP software is open source
  15. 15. THREDDS Data Server (TDS) TDS is THREDDS Data Server • THREDDS is Thematic Real-time Environmental Distributed Data Services • Middleware to bridge the gap between data providers and data users • THREDDS Data Server (TDS), a web server that provides catalog, metadata, and data access services for scientific datasets. • The TDS is open source, 100% Java, and runs inside the open source Tomcat Servlet container. Unidata’s Common Data Model • merges the OPeNDAP, netCDF, and HDF5 data models to create a common API for scientific data • implemented by the NetCDF Java library • read netCDF, OPeNDAP, HDF5, HDF4, GRIB 1 & 2, BUFR, NEXRAD 2 & 3, GEMPAK, MCIDAS, GINI, among others • A pluggable framework allows other developers to add readers for their own specialized formats. • provides standard APIs for geo-referencing coordinate systems, and specialized queries for scientific feature types like Grid, Point, and Radial datasets
  16. 16. Spectrum of Use Cases Application Data Representation OGC data model domain specific geospatial, 1-D, 2-D DAP2 data model domain neutral n-D, time series **DAP4 data model domain neutral new data types and data structures streaming, compressed, chunked Common Data Model (CDM) domain specific Future data model domain neutral?? Application Types Programmatic / Langauge API FORTRAN, C/C++, JAVA, Python, NetCDF, Java NetCDF Programmatic / Tools NetCDF, NCO, PyDAP Custom Tools: OPeNDAP crawler, ocean_prep Interactive Data Viewer IDV, Panolopy, IDL, MATLAB, iPython (matplotlib), NCL, web browser (metadata) Interactive Analysis MATLAB, IDL, iPython, NCL Custom Application: Inudation Modeller Web Application Live Access Server IMOS Data Portal (WMS) Custom Java Servlet Programming DAP2 Legacy Code existing tools DAP2 New Code New tools **DAP4 programming legacy code support **DAP4 programming new data model and protocols streaming support **DAP4 programming Asynchronous access modes, server-side processing Data Access Protocol Metadata Request das, dds, ddx ASCII/Binary Data Request Simple data representation DAP Binary Object Request NcML Data Request aggregation, virtual data sets **DAP4 server-side operations, async access mode, new data model, posting Syntax Return data set info file.nc.dds - readable file.nc.ddx - XML file.nc.asc - ASCII data return Select variables file.nc.dods?var1,var2,var3 subset arrays file.dods?var1(0:1:10) Return file translations file.nc.netcdf - NetCDF file Server-side operations file.nc?GEOLOC() Async access mode ?? Clients Programmatic Access Tsunami inudation modeller, NetCDF, NCO, PyDAP, PyNetCDF, MATLAB, IDL, … Interactive Access Web browser - Catalog MATLAB, IDL, Python, Panolopy,… Data Library & Catalog Service metadata harvesting directory listings remote THREDDS services Web Service Java servlet, Java applet Geospatial Information Service OPeNDAP data service Analysis Service Live Access Server Service Capabilities DAP2 response metadata, dods, ASCII / Binary **DAP4 Response async access mode, server- side, streaming, NcML Aggregation service Virtual Data Set Service Remote Data Access Metadata Conversion and RDF metadata definitions, translations (-> ISO) sematics, ontalogy CF->ISO, CF->WMS, CF->WCS Layered Services Catalogue service WMS, WCS services Authentication Conformance checks CF metadata check ISO metadata check **DAP4 features listed is my estimation and not the official specification

×