Aggregation and Subsetting
in ERDDAP (a middleman data server)

http://coastwatch.pfeg.noaa.gov/erddap
Bob Simons <bob.sim...
Aggregating Gridded Data
• Aggregating time points:
10,000's of data files: sst[latitude][longitude]
become one virtual da...
Subsetting Gridded Data
• OPeNDAP Projection Constraints
sst[57:57][121:2:141][163:2:183]
ERDDAP:
sst[(2012-08-12)][(20):2...
Aggregating
In-Situ and Tabular Data
• A database-like table with rows and columns
E.g., One file has data for one buoy fo...
Subsetting
In-Situ and Tabular Data
• OPeNDAP Selection Constraints
(no indices, because no multi-dimensional grids)
longi...
Don't Treat In-Situ/Tabular Data
Like Gridded Data
• CF DSG stores in-situ data as as gridded .nc
Fine for storage, not fo...
Option: Treat Gridded Data
Like Tabular Data
• Standard request: time, lat, lon bounding box
What about unusual requests o...
Summary: Huge Advantages of
Aggregation and Subsetting
• Users can find and deal with one aggregated
dataset.
• Users can ...
Aggregation and Subsetting
in ERDDAP (a middleman data server)

http://coastwatch.pfeg.noaa.gov/erddap
Bob Simons <bob.sim...
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
Aggregation and Subsetting in ERDDAP
Upcoming SlideShare
Loading in …5
×

Aggregation and Subsetting in ERDDAP

468 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
468
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Aggregation and Subsetting in ERDDAP

  1. 1. Aggregation and Subsetting in ERDDAP (a middleman data server) http://coastwatch.pfeg.noaa.gov/erddap Bob Simons <bob.simons@noaa.gov> NOAA NMFS SWFSC ERD
  2. 2. Aggregating Gridded Data • Aggregating time points: 10,000's of data files: sst[latitude][longitude] become one virtual dataset: sst[time][latitude][longitude] • Aggregating variables: Many files with one variable per file become one virtual dataset with all variables
  3. 3. Subsetting Gridded Data • OPeNDAP Projection Constraints sst[57:57][121:2:141][163:2:183] ERDDAP: sst[(2012-08-12)][(20):2:(40)][(-140):2:(-120)] • Huge time-saver: User can just request what she needs (1%). • Aggregated datasets need to be subset-able.
  4. 4. Aggregating In-Situ and Tabular Data • A database-like table with rows and columns E.g., One file has data for one buoy for one month. It isn't a multi-dimensional grid. There are no dimensions. • Aggregating features and time points: Features: stations, trajectories, profiles, ... Append into a giant virtual table.
  5. 5. Subsetting In-Situ and Tabular Data • OPeNDAP Selection Constraints (no indices, because no multi-dimensional grids) longitude,latitude,time,sst&sst>35 Easy to create. Uses domain units (degC). Very flexible. (Based on database's SQL SELECT.) • Huge time-saver User can just request what she needs (1%). • Aggregated datasets need to be subset-able.
  6. 6. Don't Treat In-Situ/Tabular Data Like Gridded Data • CF DSG stores in-situ data as as gridded .nc Fine for storage, not for subsetting. • Problem: Indices aren't domain units. How do you request sst>35 with indices? • Problem: Indices aren't real-world sequence. Grid: lat[] is a sequence. lat[42:53] has meaning. Table: Buoy number isn't. &lat>20&lat<40 is buoy #2,14,26,109, not buoy[42:53] • Problem: 5 CF DSG data structures.
  7. 7. Option: Treat Gridded Data Like Tabular Data • Standard request: time, lat, lon bounding box What about unusual requests of gridded data, e.g., SST>35 ("Select by value") • ERDDAP's EDDTableFromEDDGrid creates a giant virtual table from a gridded dataset. Columns: longitude, latitude, time, sst Query: e.g., longitude,latitude,time,sst&sst>35 Response: a table (one data point per row) • Risk: huge effort for server.
  8. 8. Summary: Huge Advantages of Aggregation and Subsetting • Users can find and deal with one aggregated dataset. • Users can make one subset request to one aggregated dataset Grids: indices to get a temporal and spatial subset. Tables (selection constraints): any subset you want. (Not: one subset request to each unaggregated file, or worse, using FTP to download lots of entire files.) • Don't treat tabular/in-situ data like gridded data.
  9. 9. Aggregation and Subsetting in ERDDAP (a middleman data server) http://coastwatch.pfeg.noaa.gov/erddap Bob Simons <bob.simons@noaa.gov> NOAA NMFS SWFSC ERD

×