1. IMOS AODAAC – gridded data access
Australian Oceans Data Access & Archive Centre
MARINE & ATMOSPHERIC RESEARCH
Edward King | IMOS Satellite Remote Sensing Facility Leader
with Matt Paget (TERN/AusCover) + Ken Suber + historical others
16 March 2014
2. Outline
• Our problem
• OPeNDAP as a means to a solution
• What we did
• Implementation
• Lessons Learned
• Opportunities
3. National Satellite Data Reception Network
• Distributed data archives
• Variety of formats
• Variety of data managers
• Range of sampling types
• Big data sets
• Resource-poor users
• Range of user capabilities
• Need to make discovery
and access easier, much
easier.
4. Rectangular Grids
• “implicit geolocation” – can compute pixel lon,lat from
grid indices via linear functions
• Straightforward
Latitude
Pixel (x)
Longitude
Line
(y)
5. Swath Data
• So-called “satellite projection”
• Explicit geolocation – lat/lon are lookup tables
• Very important use case for remote sensing use
• More difficult case – each is unique
Imagery Latitude Longitude
Channel 1
Channel 2
Cloud Mask
Quality Flags
float
float
integer
Lat
Lon
6. Non-rectangular projections
• “Map-based” higher level products
• Lon/Lat is an analytic (non-linear) functions of grid indices
• E.g. Mercator Projection
Forward transform (lon,lat) to (x,y)
Inverse transform (x,y) to (lon,lat))
Proj_x
7. Data Access Protocol
• conceived by oceanographers in 1993 (when the
www was 4) as the Distributed Oceanographic Data
System – DODS, now OPeNDAP.
• designed to be as general as possible without being
constrained to a particular discipline or world view.
• It is a data model - An abstraction for describing data
• It is a transport mechanism
• Layered over HTTP
• Anywhere the web can go, DAP is sure to (be able to) follow
• And a browser can be a client
• Data servers
• Respond to specially formed URLs
• Expose data AND metadata
• Return requested elements encapsulated within DAP
• Hyrax & TDS (THREDDS Data Server)
• Clients
• Create requests
• Unpack and use data that is returned within the DAP
9. OPeNDAP Transport Layer: A Data Standardisation Bus
OPeNDAP
Server
Local
Data
Store
OPeNDAP
Server
Local
Data
Store
OPeNDAP
Server
Local
Data
Store
Reception
Station and
Product
Generation
International Data via
Internet/Tape Model/Data
Synthesis
Internet
e.g. Curtin
U, iVEC, UTAS, CMAR
(Canberra)
e.g.
AIMS, BoM, GA, CM
AR (Hobart)
e.g. UTAS, Curtin
U, CMAR (Hobart)
10. Multi-tiered design – based on TPAC Digital Library
Client / User Applications
URL Crawler &
Metadata+Harvester
Spatial Database
Web Query Service
OPeNDAP Servers
OPeNDAP Interface Web Service Interface
11. (replicated)
Complete System (Version 2)
Can be fully distributed
Presentation title | Presenter name11 |
OPeNDAP Data Servers
URL Crawler
and Metadata
Harvester
(Java Apps)
Spatial
Data-base
(PostGIS)
Web Query
Service
(Tomcat
Webapp)
Aggre-
gator
(Java
app)
Internet
2
4 DAP
WQS
Client
(Java
app)
12. (replicated)
Complete System (Version 2)
Can be fully distributed
Presentation title | Presenter name12 |
OPeNDAP Data Servers
URL Crawler
and Metadata
Harvester
(Java Apps)
Spatial
Data-base
(PostGIS)
Web Query
Service
(Tomcat
Webapp)
Aggre-
gator
(Java
app)
Internet
2
4 DAP
Job
Controller
(Python)
Web Server
(Apache)
Temporary
Data Store
0
6
WQS
Client
(Java
app)
13. Aggregator – a system client
• accepts a list of URLs and various metadata codified as XML
returned by the web query service.
• Computes necessary index ranges to create DAP constraint
URLs
• reads data from each URL and combines (aggregates) each
data array into one (or more) arrays or files for output to the
user.
• writes the data output file (netCDF)
• Framework supports post-processing filters on netCDF file
T
14. User Experience – Low level
• Initiate data request via web call (CGI script)
Presentation title | Presenter name14 |
• Returns a JSON fragment with “handle” (URL)
15. User Experience – Low level
• Initiate data request via web call (CGI script)
Presentation title | Presenter name15 |
• Returns a JSON fragment with “handle” (URL)
• Use to examine progress
and, ultimately, get links to output netCDF
and log files.
• Can also return as JSON for easy machine
interface.
16. User Experience – higher level
• Machine interface supports
simple web front-end
• Or full portal
• Or distributed clients in a
cluster
• NOTE: we do NOT attempt to
deliver data in “web-time”
(which is not a realistic
objective for GB-scale data
systems)
Presentation title | Presenter name16 |
17. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
Presentation title | Presenter name17 |
18. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
Presentation title | Presenter name18 |
19. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
Presentation title | Presenter name19 |
20. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
• Semantically weak – eg. doesn’t have native support for
time, lon, lat etc (no OGC fluff, so…)
http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205]
Presentation title | Presenter name20 |
21. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
• Semantically weak – eg. doesn’t have native support for
time, lon, lat etc (no OGC fluff, so…)
http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205]
• Needs a spatio-temporal information infrastructure around it
Presentation title | Presenter name21 |
22. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
• Semantically weak – eg. doesn’t have native support for
time, lon, lat etc (no OGC fluff, so…)
http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205]
• Needs a spatio-temporal information infrastructure around it
• We do this with a data model implemented in the database
Presentation title | Presenter name22 |
23. DAP implications
• It does one thing really well – accesses and delivers subsets of n-
Dimensional data
• Semantically weak – eg. doesn’t have native support for
time, lon, lat etc (no OGC fluff, so…)
http://a..../opendap/file.nc.ascii?lst[0:1:0][160:1:165][200:1:205]
• Needs a spatio-temporal information infrastructure around it
• We do this with a data model implemented in the database
• It is not necessarily that DAP is the wrong solution, it just means
the hard part of the problem is not data volume, but metadata.
Presentation title | Presenter name23 |
24. Metadata Harvester
Presentation title | Presenter name24 |
Imagery Latitude Longitude
• Has to extract spatial bounding boxes from all these different types of file
• Need to help the harvester identify geospatial information in files (nominating
particular variables as relevant) – each data set needs ‘helper’ config files
• These can be maintained by the data provider OR the AODAAC admin.
• And then you need to be able to take a user ROI and transform it back to the grid
26. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
Presentation title | Presenter name26 |
27. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
Presentation title | Presenter name27 |
28. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
Presentation title | Presenter name28 |
29. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
• We were on the verge of adding remapping too (!!!)
Presentation title | Presenter name29 |
30. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
• We were on the verge of adding remapping too (!!!)
• All these features can be done as a post-filter; a la Unix philosophy of “do one
thing, and do it well”. V2 supports this approach.
Presentation title | Presenter name30 |
31. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
• We were on the verge of adding remapping too (!!!)
• All these features can be done as a post-filter; a la Unix philosophy of “do one
thing, and do it well”. V2 supports this approach.
• Web service with Tomcat was legacy of TPAC original. Would be simpler just as a
standalone Java app (and use CGI, not WSDL, for example).
Presentation title | Presenter name31 |
32. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
• We were on the verge of adding remapping too (!!!)
• All these features can be done as a post-filter; a la Unix philosophy of “do one
thing, and do it well”. V2 supports this approach.
• Web service with Tomcat was legacy of TPAC original. Would be simpler just as a
standalone Java app (and use CGI, not WSDL, for example).
• Formats with infile-compression (nc4, hdf) help
Presentation title | Presenter name32 |
33. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
• We were on the verge of adding remapping too (!!!)
• All these features can be done as a post-filter; a la Unix philosophy of “do one
thing, and do it well”. V2 supports this approach.
• Web service with Tomcat was legacy of TPAC original. Would be simpler just as a
standalone Java app (and use CGI, not WSDL, for example).
• Formats with infile-compression (nc4, hdf) help
• Robust data serving is essential
Presentation title | Presenter name33 |
34. Lessons Learned
• DAP performs well, but you need to build a fair bit of infrastructure to make it
general (V1 system only handled lon/lat grids)
• Incredible variety of input data makes this hard very quickly
• Tempting to bloat system with features (V1 could produce thumbnails for the
web, and output HDF and ASCII).
• We were on the verge of adding remapping too (!!!)
• All these features can be done as a post-filter; a la Unix philosophy of “do one
thing, and do it well”. V2 supports this approach.
• Web service with Tomcat was legacy of TPAC original. Would be simpler just as a
standalone Java app (and use CGI, not WSDL, for example).
• Formats with infile-compression (nc4, hdf) help
• Robust data serving is essential
• Don’t use a giant software project to learn a new language
Presentation title | Presenter name34 |
35. (replicated)
Distribute the computing and modularise – V1 had a lot (more) of the compute in
the WQS which became a bottleneck. In V2 the WQS is just a series of SQL
‘select’s, and the aggregator takes the rest of the load (very necessary for swath
data)
Presentation title | Presenter name35 |
OPeNDAP Data Servers
URL Crawler
and Metadata
Harvester
(Java Apps)
Spatial
Data-base
(PostGIS)
Web Query
Service
(Tomcat
Webapp)
Aggre-
gator
(Java
app)
Internet
2
4 DAP
Job
Controller
(Python)
Web Server
(Apache)
Temporary
Data Store
0
3.
XML
6
36. Opportunities + Shortcomings
• This may meet some of your needs
• (including warning you what not to do!)
• It could be a lot more useful with some back end filters such as:
• Format conversions (eg geoTIFF, csv)
• Shape file cookie cutting
• Statistics extraction
• Reprojecting + resampling
• We need tools for managing our XML config files (admin tools)
• Doesn’t handle 4+ dimensions yet (eg depth or height)
• Want to make easier to deploy as an “appliance”
• V1 live for 2+ years, V2 is being integrated in IMOS portal now
• Would be fun to run against the AGDC data set when it goes
netCDF
Presentation title | Presenter name36 |
37. Marine & Atmospheric Research
Dr Edward King
IMOS Satellite Remote Sensing
Facility Leader
t +61 3 6232 5334
e edward.king@csiro.au
w www.csiro.au/cmar
MARINE & ATMOSPHERIC RESEARCH
Thank you – questions?
38. X=Longitude (deg. East)
813 elts.
Y=Latitude
(deg. North)
670 elts.
T=Time
Days since xxx
1 elt.
WRel2
“Relative Soil Moisture (lower layer)”
1 x 617 x 813
Mean in each cell, unitless
0 <= value <= 1
Missing data = -9999
A netCDF aside….
47. And finally – data….
http://a..../.../file.nc.ascii?Wrel2[0:1:0][160:1:165][200:1:205]
48. And finally – data….
http://a..../.../file.nc.ascii?Wrel2[0:1:0][160:1:165][200:1:205]
Format Var Time Latitude Longitude
49. Note absence of semantics in the exchange
• There is nothing in the URL to impart meaning, just a variable
name and some subscripts…
• http://a..../.../file.nc.ascii?Wrel2[0:1:0][160:1:165][200:1:205]
• c.f. Fortran float Wrel2(1,670,813)
• e.g. Can’t naturally specify a bounding box (c.f. WMS)
• This is both
• A weakness: geospatial handling requires extra work
• A strength: not limited to geospatial domains
• There is always some extra work to do (or assumptions to make)
in order to be able to make a meaningful request.