Edward King SPEDDEXES 2014
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Edward King SPEDDEXES 2014

  • 186 views
Uploaded on

Australian Oceans Distributed Active Archive Centre (AODAAC) : Gridded data extraction

Australian Oceans Distributed Active Archive Centre (AODAAC) : Gridded data extraction

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
186
On Slideshare
156
From Embeds
30
Number of Embeds
1

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 30

http://speddexes.aceastern.wikispaces.net 30

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. IMOS AODAAC – gridded data access Australian Oceans Data Access & Archive Centre MARINE & ATMOSPHERIC RESEARCH Edward King | IMOS Satellite Remote Sensing Facility Leader with Matt Paget (TERN/AusCover) + Ken Suber + historical others 16 March 2014
  • 2. Outline • Our problem • OPeNDAP as a means to a solution • What we did • Implementation • Lessons Learned • Opportunities
  • 3. National Satellite Data Reception Network • Distributed data archives • Variety of formats • Variety of data managers • Range of sampling types • Big data sets • Resource-poor users • Range of user capabilities • Need to make discovery and access easier, much easier.
  • 4. Rectangular Grids • “implicit geolocation” – can compute pixel lon,lat from grid indices via linear functions • Straightforward Latitude Pixel (x) Longitude Line (y)
  • 5. Swath Data • So-called “satellite projection” • Explicit geolocation – lat/lon are lookup tables • Very important use case for remote sensing use • More difficult case – each is unique Imagery Latitude Longitude Channel 1 Channel 2 Cloud Mask Quality Flags float float integer Lat Lon
  • 6. Non-rectangular projections • “Map-based” higher level products • Lon/Lat is an analytic (non-linear) functions of grid indices • E.g. Mercator Projection Forward transform (lon,lat) to (x,y) Inverse transform (x,y) to (lon,lat)) Proj_x
  • 7. Data Access Protocol • conceived by oceanographers in 1993 (when the www was 4) as the Distributed Oceanographic Data System – DODS, now OPeNDAP. • designed to be as general as possible without being constrained to a particular discipline or world view. • It is a data model - An abstraction for describing data • It is a transport mechanism • Layered over HTTP • Anywhere the web can go, DAP is sure to (be able to) follow • And a browser can be a client • Data servers • Respond to specially formed URLs • Expose data AND metadata • Return requested elements encapsulated within DAP • Hyrax & TDS (THREDDS Data Server) • Clients • Create requests • Unpack and use data that is returned within the DAP
  • 8. Workflow Data File DAP Server DAP Client Requests DAP ResponsesMapping To DAP Write to netCDF Use in computation e.g. Filesystem Access DAP object • Grids • Sequences • Structures Formats • netCDF • HDF4/5 • Grib • “Freeform” Client Libraries • C • Java • Python • Matlab or
  • 9. OPeNDAP Transport Layer: A Data Standardisation Bus OPeNDAP Server Local Data Store OPeNDAP Server Local Data Store OPeNDAP Server Local Data Store Reception Station and Product Generation International Data via Internet/Tape Model/Data Synthesis Internet e.g. Curtin U, iVEC, UTAS, CMAR (Canberra) e.g. AIMS, BoM, GA, CM AR (Hobart) e.g. UTAS, Curtin U, CMAR (Hobart)
  • 10. Multi-tiered design – based on TPAC Digital Library Client / User Applications URL Crawler & Metadata+Harvester Spatial Database Web Query Service OPeNDAP Servers OPeNDAP Interface Web Service Interface
  • 11. (replicated) Complete System (Version 2) Can be fully distributed Presentation title | Presenter name11 | OPeNDAP Data Servers URL Crawler and Metadata Harvester (Java Apps) Spatial Data-base (PostGIS) Web Query Service (Tomcat Webapp) Aggre- gator (Java app) Internet 2 4 DAP WQS Client (Java app)
  • 12. (replicated) Complete System (Version 2) Can be fully distributed Presentation title | Presenter name12 | OPeNDAP Data Servers URL Crawler and Metadata Harvester (Java Apps) Spatial Data-base (PostGIS) Web Query Service (Tomcat Webapp) Aggre- gator (Java app) Internet 2 4 DAP Job Controller (Python) Web Server (Apache) Temporary Data Store 0 6 WQS Client (Java app)
  • 13. Aggregator – a system client • accepts a list of URLs and various metadata codified as XML returned by the web query service. • Computes necessary index ranges to create DAP constraint URLs • reads data from each URL and combines (aggregates) each data array into one (or more) arrays or files for output to the user. • writes the data output file (netCDF) • Framework supports post-processing filters on netCDF file T
  • 14. User Experience – Low level • Initiate data request via web call (CGI script) Presentation title | Presenter name14 | • Returns a JSON fragment with “handle” (URL)
  • 15. User Experience – Low level • Initiate data request via web call (CGI script) Presentation title | Presenter name15 | • Returns a JSON fragment with “handle” (URL) • Use to examine progress and, ultimately, get links to output netCDF and log files. • Can also return as JSON for easy machine interface.
  • 16. User Experience – higher level • Machine interface supports simple web front-end • Or full portal • Or distributed clients in a cluster • NOTE: we do NOT attempt to deliver data in “web-time” (which is not a realistic objective for GB-scale data systems) Presentation title | Presenter name16 |
  • 17. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data Presentation title | Presenter name17 |
  • 18. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data Presentation title | Presenter name18 |
  • 19. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data Presentation title | Presenter name19 |
  • 20. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data • Semantically weak – eg. doesn’t have native support for time, lon, lat etc (no OGC fluff, so…) http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205] Presentation title | Presenter name20 |
  • 21. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data • Semantically weak – eg. doesn’t have native support for time, lon, lat etc (no OGC fluff, so…) http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205] • Needs a spatio-temporal information infrastructure around it Presentation title | Presenter name21 |
  • 22. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data • Semantically weak – eg. doesn’t have native support for time, lon, lat etc (no OGC fluff, so…) http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205] • Needs a spatio-temporal information infrastructure around it • We do this with a data model implemented in the database Presentation title | Presenter name22 |
  • 23. DAP implications • It does one thing really well – accesses and delivers subsets of n- Dimensional data • Semantically weak – eg. doesn’t have native support for time, lon, lat etc (no OGC fluff, so…) http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205] • Needs a spatio-temporal information infrastructure around it • We do this with a data model implemented in the database • It is not necessarily that DAP is the wrong solution, it just means the hard part of the problem is not data volume, but metadata. Presentation title | Presenter name23 |
  • 24. Metadata Harvester Presentation title | Presenter name24 | Imagery Latitude Longitude • Has to extract spatial bounding boxes from all these different types of file • Need to help the harvester identify geospatial information in files (nominating particular variables as relevant) – each data set needs ‘helper’ config files • These can be maintained by the data provider OR the AODAAC admin. • And then you need to be able to take a user ROI and transform it back to the grid
  • 25. Geospatial model Presentation title | Presenter name25 |
  • 26. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) Presentation title | Presenter name26 |
  • 27. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly Presentation title | Presenter name27 |
  • 28. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). Presentation title | Presenter name28 |
  • 29. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). • We were on the verge of adding remapping too (!!!) Presentation title | Presenter name29 |
  • 30. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). • We were on the verge of adding remapping too (!!!) • All these features can be done as a post-filter; a la Unix philosophy of “do one thing, and do it well”. V2 supports this approach. Presentation title | Presenter name30 |
  • 31. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). • We were on the verge of adding remapping too (!!!) • All these features can be done as a post-filter; a la Unix philosophy of “do one thing, and do it well”. V2 supports this approach. • Web service with Tomcat was legacy of TPAC original. Would be simpler just as a standalone Java app (and use CGI, not WSDL, for example). Presentation title | Presenter name31 |
  • 32. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). • We were on the verge of adding remapping too (!!!) • All these features can be done as a post-filter; a la Unix philosophy of “do one thing, and do it well”. V2 supports this approach. • Web service with Tomcat was legacy of TPAC original. Would be simpler just as a standalone Java app (and use CGI, not WSDL, for example). • Formats with infile-compression (nc4, hdf) help Presentation title | Presenter name32 |
  • 33. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). • We were on the verge of adding remapping too (!!!) • All these features can be done as a post-filter; a la Unix philosophy of “do one thing, and do it well”. V2 supports this approach. • Web service with Tomcat was legacy of TPAC original. Would be simpler just as a standalone Java app (and use CGI, not WSDL, for example). • Formats with infile-compression (nc4, hdf) help • Robust data serving is essential Presentation title | Presenter name33 |
  • 34. Lessons Learned • DAP performs well, but you need to build a fair bit of infrastructure to make it general (V1 system only handled lon/lat grids) • Incredible variety of input data makes this hard very quickly • Tempting to bloat system with features (V1 could produce thumbnails for the web, and output HDF and ASCII). • We were on the verge of adding remapping too (!!!) • All these features can be done as a post-filter; a la Unix philosophy of “do one thing, and do it well”. V2 supports this approach. • Web service with Tomcat was legacy of TPAC original. Would be simpler just as a standalone Java app (and use CGI, not WSDL, for example). • Formats with infile-compression (nc4, hdf) help • Robust data serving is essential • Don’t use a giant software project to learn a new language Presentation title | Presenter name34 |
  • 35. (replicated) Distribute the computing and modularise – V1 had a lot (more) of the compute in the WQS which became a bottleneck. In V2 the WQS is just a series of SQL ‘select’s, and the aggregator takes the rest of the load (very necessary for swath data) Presentation title | Presenter name35 | OPeNDAP Data Servers URL Crawler and Metadata Harvester (Java Apps) Spatial Data-base (PostGIS) Web Query Service (Tomcat Webapp) Aggre- gator (Java app) Internet 2 4 DAP Job Controller (Python) Web Server (Apache) Temporary Data Store 0 3. XML 6
  • 36. Opportunities + Shortcomings • This may meet some of your needs • (including warning you what not to do!) • It could be a lot more useful with some back end filters such as: • Format conversions (eg geoTIFF, csv) • Shape file cookie cutting • Statistics extraction • Reprojecting + resampling • We need tools for managing our XML config files (admin tools) • Doesn’t handle 4+ dimensions yet (eg depth or height) • Want to make easier to deploy as an “appliance” • V1 live for 2+ years, V2 is being integrated in IMOS portal now • Would be fun to run against the AGDC data set when it goes netCDF Presentation title | Presenter name36 |
  • 37. Marine & Atmospheric Research Dr Edward King IMOS Satellite Remote Sensing Facility Leader t +61 3 6232 5334 e edward.king@csiro.au w www.csiro.au/cmar MARINE & ATMOSPHERIC RESEARCH Thank you – questions?
  • 38. X=Longitude (deg. East) 813 elts. Y=Latitude (deg. North) 670 elts. T=Time Days since xxx 1 elt. WRel2 “Relative Soil Moisture (lower layer)” 1 x 617 x 813 Mean in each cell, unitless 0 <= value <= 1 Missing data = -9999 A netCDF aside….
  • 39. OPeNDAP server – explore via a browser
  • 40. http://aodaac2-cbr.act.csiro.au:8080/ opendap/auscover/awap/run26a/monthly/wrel2/contents.html OPeNDAP server – explore via a browser
  • 41. http://aodaac2-cbr.act.csiro.au:8080/ opendap/auscover/awap/run26a/monthly/wrel2/contents.html OPeNDAP server – explore via a browser File Download Link
  • 42. http://aodaac2-cbr.act.csiro.au:8080/ opendap/auscover/awap/run26a/monthly/wrel2/contents.html OPeNDAP server – explore via a browser File Download Link DAP Links
  • 43. Data Structures: DDS link…
  • 44. Data Attributes: DAS link…
  • 45. XML Package: DDX link…
  • 46. And finally – data….
  • 47. And finally – data…. http://a..../.../file.nc.ascii?Wrel2[0:1:0][160:1:165][200:1:205]
  • 48. And finally – data…. http://a..../.../file.nc.ascii?Wrel2[0:1:0][160:1:165][200:1:205] Format Var Time Latitude Longitude
  • 49. Note absence of semantics in the exchange • There is nothing in the URL to impart meaning, just a variable name and some subscripts… • http://a..../.../file.nc.ascii?Wrel2[0:1:0][160:1:165][200:1:205] • c.f. Fortran float Wrel2(1,670,813) • e.g. Can’t naturally specify a bounding box (c.f. WMS) • This is both • A weakness: geospatial handling requires extra work • A strength: not limited to geospatial domains • There is always some extra work to do (or assumptions to make) in order to be able to make a meaningful request.