Tim Pugh-SPEDDEXES 2014

382 views

Published on

How OPeNDAP has transformed the way we do science plus snapshots of recent developments and BoM’s operational systems built on this technology

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
382
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Tim Pugh-SPEDDEXES 2014

  1. 1. How OPeNDAP has transformed the way we do science plus snapshots of recent developments and BoM‟s operational systems built on this technology Tim Pugh SPEDDEXES workshop 17-21 March 2014
  2. 2. Evolution • Traditionally… – Scientific research is conducted in a quiet room in isolation utilising unique data, scripts, and code – Scientific collaboration is conducted at conferences with file sharing by FTP or HTTP bulk download • Today – Scientific research is being driven to shared research services and supported infrastructure • To relieve the scientist of laborious developments • To manage more complex machinery • To improve scientific integrity and collaboration • To work within managed and supported infrastructure – Science is moving from file sharing to data sharing collaboration
  3. 3. CAWCR Research Data Server • Location: http://opendap.bom.gov.au:8080/thredds • Unidata THREDDS Data Server v4.2.8 • http://www.unidata.ucar.edu/projects/THREDDS/tech/TDS.html • The THREDDS Data Server (TDS) is a JavaSevlet, and is contained in a single war file, which allows very easy installation into Tomcat web server.
  4. 4. OPeNDAP Now Is: • An acronym – “Open-source Project for a Network Data Access Protocol” – Often a synonym for “DAP” • A not-for-profit corp. developing/supporting – “DAPx” - a web-services protocol for data access • Deployed by hundreds of data providers internationally • Employed in many analysis packages (MATLAB, e.g.) • Designated a “Community Standard” by NASA – Server & client implementations* of DAP *Note: there are other implementations
  5. 5. BROAD VISION 1. A world in which a single data access protocol is used for the exchange of data between network-based applications regardless of discipline. 2. A layer above TCP/IP providing for syntactic and semantic consistency not available in existing protocols such as FTP.
  6. 6. Fundamental Objective of OPENDAP • The fundamental objective of OPeNDAP and OPeNDAP Inc. is to facilitate internet access to scientific data • This is done by: • Providing a protocol (DAP) to access data over the internet, • Hiding the format (and organization) in which the data are stored from the user, and • Providing subsetting (and other) capabilities for the data at the server • OPeNDAP is based on a multi-tier architecture • OPeNDAP software is open source
  7. 7. OPeNDAP Data-Type Philosophy the OPeNDAP data model has few data types simplified programming/lowered risk of errors they are intentionally discipline-neutral better trans-domain utility & programmer uptake they nonetheless fill discipline-specific needs netCDF-like (good in contexts where, e.g., data might represent functions with 4- or 5-D domains) sequences & selections match dbms sensibilities
  8. 8. TDS Server • TDS is THREDDS Data Server – THREDDS is Thematic Real-time Environmental Distributed Data Services – Middleware to bridge the gap between data providers and data users – THREDDS Data Server (TDS), a web server that provides catalog, metadata, and data access services for scientific datasets. – The TDS is open source, 100% Java, and runs inside the open source Tomcat Servlet container. • Unidata‟s Common Data Model – merges the OPeNDAP, netCDF, and HDF5 data models to create a common API for scientific data – implemented by the NetCDF Java library – read netCDF, OPeNDAP, HDF5, HDF4, GRIB 1 & 2, BUFR, NEXRAD 2 & 3, GEMPAK, MCIDAS, GINI, among others – A pluggable framework allows other developers to add readers for their own specialized formats. – provides standard APIs for geo-referencing coordinate systems, and specialized queries for scientific feature types like Grid, Point, and Radial datasets
  9. 9. Some of the Technology in the TDS 1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and associated metadata. 2. The Netcdf-Java/CDM library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other binary formats such as GRIB and NEXRAD, essentially an (extended) netCDF view of the data. 3. TDS can use the NetCDF Markup Language (NcML) to modify and create virtual aggregations of datasets. 4. An integrated server provides OPeNDAP access with subsetting data access method. 5. An integrated server provides bulk file access through the HTTP protocol. 6. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Coverage Service (WCS) protocol, for any "gridded" dataset whose coordinate system information is complete. 7. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Map Service (WMS) protocol, for any "gridded" dataset whose coordinate system information is complete. 8. The integrated ncISO server provides automated metadata analysis and ISO metadata generation.
  10. 10. THREDDS Catalog • The goal is… – to simplify the discovery and use of scientific data and to allow scientific publications and educational materials to reference scientific data. – initial focus was to allow data users to find datasets that are pertinent to their specific education and research needs, access the data, and use them without necessarily downloading the entire file to their local system. – Catalogs are the heart of the data access services, and is the THREDDS concept. Catalogs consist of XML documents that describe on-line datasets. – Catalogs can contain arbitrary metadata, however we also defined a standard set of metadata to bridge to discovery centers • CF (Climate & Forecast) and Unidata Data Discovery metadata
  11. 11. Spectrum of Use Cases Application Data Representation OGC data model domain specific geospatial, 1-D, 2-D DAP2 data model domain neutral n-D, time series **DAP4 data model domain neutral new data types and data structures streaming, compressed, chunked Common Data Model (CDM) domain specific Future data model domain neutral?? Application Types Programmatic / Langauge API FORTRAN, C/C++, JAVA, Python, NetCDF, Java NetCDF Programmatic / Tools NetCDF, NCO, PyDAP Custom Tools: OPeNDAP crawler, ocean_prep Interactive Data Viewer IDV, Panolopy, IDL, MATLAB, iPython (matplotlib), NCL, web browser (metadata) Interactive Analysis MATLAB, IDL, iPython, NCL Custom Application: Inudation Modeller Web Application Live Access Server IMOS Data Portal (WMS) Custom Java Servlet Programming DAP2 Legacy Code existing tools DAP2 New Code New tools **DAP4 programming legacy code support **DAP4 programming new data model and protocols streaming support **DAP4 programming Asynchronous access modes, server-side processing Data Access Protocol Metadata Request das, dds, ddx ASCII/Binary Data Request Simple data representation DAP Binary Object Request NcML Data Request aggregation, virtual data sets **DAP4 server-side operations, async access mode, new data model, posting Syntax Return data set info file.nc.dds - readable file.nc.ddx - XML file.nc.asc - ASCII data return Select variables file.nc.dods?var1,var2,var3 subset arrays file.dods?var1(0:1:10) Return file translations file.nc.netcdf - NetCDF file Server-side operations file.nc?GEOLOC() Async access mode ?? Clients Programmatic Access Tsunami inudation modeller, NetCDF, NCO, PyDAP, PyNetCDF, MATLAB, IDL, … Interactive Access Web browser - Catalog MATLAB, IDL, Python, Panolopy,… Data Library & Catalog Service metadata harvesting directory listings remote THREDDS services Web Service Java servlet, Java applet Geospatial Information Service OPeNDAP data service Analysis Service Live Access Server Service Capabilities DAP2 response metadata, dods, ASCII / Binary **DAP4 Response async access mode, server- side, streaming, NcML Aggregation service Virtual Data Set Service Remote Data Access Metadata Conversion and RDF metadata definitions, translations (-> ISO) sematics, ontalogy CF->ISO, CF->WMS, CF->WCS Layered Services Catalogue service WMS, WCS services Authentication Conformance checks CF metadata check ISO metadata check **DAP4 features listed is my estimation and not the official specification
  12. 12. Use Case limitations • Time to access data is dependent on the following factors: • Hardware and network performance • Selection of variables and dimensions • Number of data requests to be issued − Latency inherent in the data request • Number of concurrent accesses to the server
  13. 13. DAP-enabled client tools/applications OPeNDAP Clients (partial list) http://opendap.org/whatClients 1. Web browser returning ASCII data 2. Pydap - is a pure Python library implementation of the DAP2 3. NetCDF - is a set of software libraries and self-describing, machine- independent data formats with interfaces to Python, FORTRAN, C/C++, and Java languages 4. NCO – comprises a dozen standalone, command-line programs that take netCDF files as input 5. MATLAB – a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation 6. Panoply – Panoply is a cross-platform application which plots geo-gridded arrays from netCDF, HDF and GRIB datasets.
  14. 14. Developments by Bureau and CSIRO • Development of web portals for data access services and information systems in climate and environment – Seasonal Climate Outlook Rebuild (Roald de Wit) – Natural Resource Management (NRM) Climate Change Portal (Tim Erwin) – eReef‟s Marine Quality Dashboard and data services (Jonathon Hodge) – National Environmental Information Infrastructure (NEII) (Andrew Woolf) – CAWCR research data services (Duan Beckett) • Establish Climate Data Publishing services at NCI – NCI, CSIRO, Bureau of Meteorology, CoE CSS – Earth System Grid (ESG) – Climate and Weather Science Laboratory (CWSLab)
  15. 15. SCO-R Project overview
  16. 16. Project overview • More interactivity and functionality needed • Demand for POAMA multi-week forecast products • Long term view of seamless transition between forecasts • Building upon experiences / technologies from other BoM projects (e.g.MetEye and PASAP/PACCSAP)
  17. 17. SCO-R architecture MapCache BOM.Map / BOM.App Custom WMS Service (Python)
  18. 18. Climate Futures Climate Futures approach to the provision of regional climate projection information CMAR/CLIMATE ADAPTATION FLAGSHIP Tim Erwin Acknowledgements: Penny Whetton, Kevin Hennessy, John Clarke, David Kent 28 October 2013
  19. 19. Climate Data • Processed from climate model data (CMIP3 and CMIP5) • NetCDF file format • 10 variables (temperature, rainfall, humidity...) • 20 year seasonal averages (2030, 2035, ..., 2090) • Base period (1950 – 2005) stored as monthly time span • Catalogued in THREDDS server – Allows DAP access • Django • THREDDS catalogues are parsed and stored – model, variable, dap url, layer name, time span
  20. 20. Architecture
  21. 21. Architecture THREDDS ZOO
  22. 22. ZOO-Project (WPS Server) • Consists of: Kernel, API, Service • Works with Apache through a cgi file and a conf file • Support several common programming languages C/C++, Fortran, Python, PHP, Perl, Java, JavaScript • Used to create area average of gridded data using non- rectangular mask • Predefined mask • Polygon (GML,KML,GEOM) • Not limited to geographic operations
  23. 23. OPeNDAP Technology Developments • DAP4 protocol and data model implementation (OPULS) – OPULS (an OPeNDAP-Unidata collaboration) – DAP4 (to supersede DAP2) – Experimental extensions (Async access, UGRID subsets) • DAP2 & DAP4 JSON response type – Improve javascript client utilisation of DAP services • ncWMS integration and WMS extensions – contour map types – THREDDS and Hyrax integration of ncWMS • Programmatic Data Access for secure services – RDSI DaSh project to support programmatic data access – Integration within reX Identity and Authorisation Management
  24. 24. DAP4 Experiments • DAP4 provides more complete support for functions including metadata responses (DAP2 does not provide this; a gap in the DAP2 specification) – Experiments with Unstructured Grid (irregular mesh) subsetting – Binning: returns a distribution (as a raster of boolean values on a user-specified grid) of data values satisfying some criteria – Masking: accepts a raster of zero/nonzero values as a query argument, perhaps as a geospatial selection criterion • OPeNDAP are running several experimental mini- projects within its context: – Asynchronous access, data streaming, cloud computing and an expanded, function-based, server-side processing system
  25. 25. thank you – have a great experience Tim F. Pugh HPC and CWSLab Project Lead Melbourne, Victoria, Australia Email: t.pugh@bom.gov.au Office: +61 3 9669 4345
  26. 26. Workshop Use-Cases Application Data Representation DAP2 data model domain neutral n-D, time series Application Types Programmatic / Langauge API FORTRAN, C/C++, JAVA, Python, NetCDF, Java Netcdf, PyDAP Programmatic / Tools NetCDF, NCO, PyDAP Custom Tools: OPeNDAP crawler Interactive Data Viewer Panolopy, MATLAB, NCL, web browser Programming DAP2 Legacy Code existing tools: DAP2 New Code New tools Data Access Protocol Metadata Request das, dds, ddx ASCII/Binary Data Request Simple data representation DAP Binary Object Request NcML Data Request aggregation Syntax Return metadata info file.nc.das - readable file.nc.dds - readable file.nc.ddx - XML metadata file.nc.help - help info Select vars and return data file.nc.asc?var1,var2,var3 file.nc.dods?var1,var2,var3 subset arrays, return data file.asc?var1(0:1:10) file.dods?var1(0:1:10) Return file translations file.nc.netcdf - NetCDF file Server-side operations file.nc?GEOLOC() Clients Programmatic Access NetCDF, NCO, PyDAP, PyNetCDF Interactive Access Web browser - Catalog Python, MATLAB, Panolopy Service Capabilities DAP2 response THREDDS data service Hyrax data service NcML Aggregation service Layered Services Catalog service WMS
  27. 27. Pydap client • >>> from pydap.client import open_url • >>> dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc') • >>> var = dataset['SST'] • >>> var.shape • (12, 90, 180) • >>> var.type • <class 'pydap.model.Float32'> • >>> print var[0,10:14,10:14] # this will download data from the server • <class 'pydap.model.GridType'> • with data • [[ -1.26285708e+00 -9.99999979e+33 -9.99999979e+33 -9.99999979e+33] • [ -7.69166648e-01 -7.79999971e-01 -6.75454497e-01 -5.95714271e-01] • [ 1.28333330e-01 -5.00000156e-02 -6.36363626e-02 -1.41666666e-01] • [ 6.38000011e-01 8.95384610e-01 7.21666634e-01 8.10000002e-01]] • and axes • 366.0 • [-69. -67. -65. -63.] • [ 41. 43. 45. 47.]
  28. 28. NetCDF client • >>> import netCDF4 • >>> url = 'http://test.opendap.org/dap/data/nc/coads_climatology.nc‟ • >>> dataset = netCDF4.Dataset(url) • >>> var = dataset.variables['SST'] • >>> var.shape • (12, 90, 180) • >>> print var[0,10:14,10:14] # this will download data from the server • <class 'pydap.model.GridType'> • with data • [[-1.26285707951 -- -- --] • [-0.769166648388 -0.77999997139 -0.675454497337 -0.595714271069] • [0.128333330154 -0.0500000156462 -0.0636363625526 -0.141666665673] • [0.638000011444 0.895384609699 0.721666634083 0.810000002384]] • >>> print var • <type 'netCDF4.Variable'> • float32 SST('TIME', 'COADSY', 'COADSX') • …
  29. 29. MATLAB and SNCtools • % ex_snctools_opendap.m • % Read from a remote OPeNDAP server with the same file • % • ncRef = 'http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111 06-ABOM-L4LRfnd-GLOB-v01-fv01.nc' • nc_dump( ncRef ); • pause • temp = nc_varget( ncRef, 'analysed_sst'); • lon = nc_varget( ncRef, 'lon'); • lat = nc_varget( ncRef, 'lat'); • imagesc(lat, lon, temp); axis xy
  30. 30. MATLAB and NJTbx demo • % ex_njtbx.m • % Read from a remote OPeNDAP server with the same file • % • ncRef = 'http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111 06-ABOM-L4LRfnd-GLOB-v01-fv01.nc' • nj_info( ncRef ) • pause • [temp, grid] = nj_grid_varget(ncRef,'analysed_sst'); • imagesc(grid.lon, grid.lat, temp); axis xy; colorbar
  31. 31. Tomcat/[Apache] dodsC fileServer wms ncss THREDDS services syntax {contextPath} = “thredds” (servlet default name) {service} = “fileServer” | “dodsC” | “wms” | “wcs” • Bulk File Transfer fileServer = HTTP Server (any file) • Remote access, subsetting CDM files dodsC = OPeNDAP (any CDM file) wms = Web Map Server (grids) wcs = Web Coverage Server (grids) ncss = NetCDF Subset Service (grids) admin = Administration/debug interface Note, each server can change the service name in the xml catalogue. http://{server:port}/{contextPath}/{service}/... wcs Catalogs thredds
  32. 32. Hyrax service syntax Tomcat/[Apache] opendap hyrax docs {contextPath} = “opendap” (servlet default name) {service} = “hyrax” | “admin” | “docs” hyrax = catalog interface admin = administration interface (v1.8+) docs = documentation (v1.8+) Note, each server can change the service name within the server configuration file. http://{server:port}/{contextPath}/{service}/… http://test.opendap.org/opendap/hyrax/...e.g. admin
  33. 33. Hyrax Data Service • DAP2 and DAP3.x as the protocol develops • Other dataset responses* • ASCII & NetCDF renderings of data (not limited to data natively stored in netCDF) • RDF • ISO 19115 and the conformance rubric (Hyrax 1.8) • Other server responses** • THREDDS catalogs Tomcat/[Apache] Hyrax DAP2 RDF* Catalogs** DAP3.x Note: Hyrax and TDS are not mutually excusive; Sites can install both with little extra effort.
  34. 34. Data Discovery and Access • Data discovery services • NASA‟s Global Change Master Directory − http://gcmd.nasa.gov • IMOS eMII portal − http://imosmest.aodn.org.au/geonetwork/srv/en/main.home − Help --> http://emii1.its.utas.edu.au/drupal/?q=node/25 • TERN AusCover portal − http://data.auscover.org.au/ • My Ocean portal − http://www.myocean.eu/web/24-catalogue.php • TPAC Digital Library − http://dl.tpac.org.au • Data access services • Unidata‟s THREDDS Data Service − http://www.unidata.ucar.edu/projects/THREDDS/ • OPeNDAP‟s Hyrax Data Service − http://opendap.org/download/hyrax.html • NOAA‟s ERDDAP Data Service − http://coastwatch.pfeg.noaa.gov/erddap
  35. 35. Some of the Technology in Hyrax 1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and associated metadata. 2. Supports many formats and data stores: netCDF3, netCDF4, HDF4, HDF5, FreeForm, SQL data bases 3. Uses a plug-in based architecture and includes tools to write custom handlers 4. NetCDF Markup Language (NcML) to modify and create virtual aggregations of datasets. 5. OPeNDAP access with subsetting data access method. 6. bulk file access through the HTTP protocol. 7. ncISO server provides automated metadata analysis and ISO metadata generation. 8. RDF output - Metadata as triples; used with web-based reasoning systems 9. Code that has passed a formal security audit 10. A true multi-system architecture that can fit in a variety of enterprise settings 11. An administrator‟s interface
  36. 36. DAP Responses • DAP2 defines three response types: • DAS: A text document that contains data set attributes • DDS: A text document that contains data set variable types and names • DODS: A quasi-multipart MIME document that contains the DDS and associated binary values for a data request • DAP3.x defines two additional response types: • DDX: An XML document that combines both variable type and name information along with attributes • DataDDX: A multipart MIME document that combines a DDX with the associated binary values for a data request TDS and Hyrax both support DAP2; Hyrax includes support for DAP3, TDS has support for the DDX
  37. 37. Some Definitions DAP = Data Access Protocol  Model used to describe the data;  Request syntax and semantics; and  Response syntax and semantics.  The data structure returned to the user OPeNDAP  The software that forms the service;  Numerous implementations (Hyrax (reference), THREDDS,…);  Core/libraries for client applications and services. THREDDS / Hyrax  A service framework (portal) that contains the OPeNDAP service;
  38. 38. Decipher the URL • http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111 06-ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[0:1:1439] • Given the OPeNDAP data request above, decipher the URL. − Request Protocol? http − Host name:port? //opendap.bom.gov.au:8080/ − ContextPath? thredds/ − Service? dodsC/ − Unique path to data set? gamssa_4deg/2009/ − Data reference? 20111106-ABOM-L4LRfnd-GLOB-v01-fv01.nc − Return type? ascii − Return variables? ?lon − Return variable indice range? [0:1:1439] --> [start:skip:end]
  39. 39. NcML NetCDF Meta Language NcML can provide two basic features: • Augmenting/Modifying data sets with new • Attributes • Values • Combining two or more data sets (i.e., files) in an aggregation Three kinds of aggregation are supported: • Tile files • Join files along an existing axis • Join files along a new axis While very powerful, these aggregations are not applicable to every data set made up of multiple files
  40. 40. DAP4 Summary • DAP (DAP2 and DAP4) is based on datasets built of variables that share the characteristics of programming languages • Constraints are used to subset data on the server • DAP4 is a REST API • DAP4 specifies „modern‟ web services – While DAP2 was a data model only, DAP4 includes specification of the web services • DAP4 provides more complete support for functions

×