HDF-EOS 2/5 to netCDF
Converter
Bob Bane, Richard Ullman, Jingli Yang
Data Usability Group
NASA/Goddard Space Flight Center
Introduction
• Status report
• Properties of netCDF and HDF-EOS
• Conversion strategy
Status Report
• Last year - hdfeos52netcdf
– HDF-EOS 5 -> netCDF
– COARDS compatible

• Current
– Uses he25 interoperability library, so does both
HDF-EOS 2 and 5
– CF compatible
Data Formats and Conventions
• Generic data containers
– HDF, netCDF

• Conventions for domain-specific metadata
– HDF-EOS, COARDS/CF

• HDF -> HDF-EOS
• netCDF -> COARDS/CF
netCDF
• netCDF files contain:
– Variables
• multi-dimensional arrays of basic data types
(character/integer/float)

– Dimensions
• named sizes for dimensions of variables

– Attributes
• named one-dimensional arrays
• properties of variables
netCDF Conventions
• Metadata is stored in attributes
– Conventions for names: “units”

• Coordinate vector
– Variable with the same name as a dimension
– Value is a vector of same size as the dimension
– Is a mapping between (0,1,2…) dimension
indexing and physical quantities for dimension
COARDS Conventions
• Cooperative Ocean/Atmospheric Research
Data Service
– Conventions for use of netCDF
• Order of dimensions for variables
• Names of attributes (“Units”, “_FillValue”)
• Coordinate variables

– http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf
_profile.html
CF Conventions
• Climate and Forecast
– Follow-on to COARDS
– Tighter
• Many attributes optional in COARDS are required in CF

– More capable
• Multi-dimensional geolocation support

– http://www.cgd.ucar.edu/cms/eaton/cfmetadata/
HDF
• Hierarchical Data Format
• HDF files contain:
– Datasets
• multi-dimensional arrays of basic data types

– Dimensions
• Named sizes of dataset dimensions

– Groups
• Named groups of datasets (and groups)

– Attributes
• Named properties of datasets and groups, similar to netCDF
HDF-EOS
• Conventions and API for HDF
• HDF-EOS files contain:
– Fields (datasets)
– Points
• Individually geolocated measurements

– Swaths
• Groups of data and geolocation fields, and mappings
between them

– Grids
• Groups of data fields with rectilinear geolocation
HDF-EOS (cont)
• HDF-EOS 2 over HDF4
• HDF-EOS 5 over HDF5
– HDF5 very different from HDF4
– HDF-EOS 2/5 near identical API
– Our he25 library allows uniform access to
HDF-EOS 2/5, so converter works for both
• Looks/works like HDF-EOS 5
• On HDF-EOS 4 files, translates in/out
Observations
• HDF-EOS is “bigger” than netCDF
– Additional structured metadata (ODL)
– HDF-EOS API calls for geolocation

• netCDF file ~= HDF-EOS Swath/Grid
– Both are groups of related datasets
Conversion Strategies
• One HDF-EOS file -> one netCDF file
– Alternative is one Swath/Grid -> one file

• COARDS/CF - if original HDF-EOS followed
conventions, converted netCDF will also
– Most HDF-EOS data producers are aware of
COARDS/CF

• Skip HDF-EOS Point datasets
– Reconsider this if real world Point data emerges
Conversion Strategies (cont)
• Convert data to enable future processing
– Geolocation data, attributes (units)
– Other metadata less important
• Could transfer ODL metadata as a string, but why?

– Can always go back to the original file and use
good HDF-EOS tools
Conversion in General
HDF-EOS
Swath s1
Dimensions(lat,lon,time)
Datafield f1(lat,lon,time)
Geofield f2(lat,lon,time)
Swath s2
Dimensions(lat,lon,time)
Datafield f3(lat,lon,time)
Geofield f4(lat,lon.time)

netCDF
Dimensions(lat,lon,time,s2_time)
Variable s1_f1(lat,lon,time)
Variable s1_GEO_f2(lat,lon,time)
Variable s2_f3(lat,lon,s2_time)
Variable s2_GEO_f4(lat,lon,s2_time)

• Flatten HDF-EOS hierarchy
• Encode names, types in variable names
Swaths
HDF-EOS
Swath s2
Dimensions(lat, glat ,lon, glon, time)
DimensionMap(lat, glat, 0, 1)
DimensionMap(lon, glon, 0, 1)
Datafield f3(lat,lon,time)
Geofield f4(glat,glon.time)

netCDF
Dimensions(lat,glat,lon,glon,time,s2_time)
Attributes:
s2_DimensionMap: “lat/glat, lon/glon”
s2_DMOffsets: (0,0)
s2_DMIncrements: (1,1);
Variable s2_f3(lat,lon,s2_time)
Attributes:
coordinates: “s2_GEO_f3”
Variable s2_GEO_f4(glat,glon,s2_time)

• Swath name, geofield type encoded in
variable names
• Record dimension map in global attributes
Grids
HDF-EOS
Grid g1
Dimensions(lat,lon,time)
Corners(upleft, upright,
lowleft, lowright)
Datafield f1(lat,lon,time)

netCDF
Dimensions(lat,lon,time)
Variable lat(lat)
= (lowright,…upright)
Variable lon(lon)
= (lowleft, … upleft)
Variable g1_f1(lat,lon,time)

• Grid geolocation becomes coordinate
variables
Converter
• C command-line application
– hdfeos2netcdf HDF_file netCDF_file

• Should be portable to all HDFEOS5/netCDF platforms
– Naturally uses all libraries
Where is the Software?
• http://hdfeos.gsfc.nasa.gov
– ‘Tools’ category
– System ‘hdfeos2netcdf’
Big Picture
HDF-EOS
File Attributes
fa1: “fa value”
Swath s1
Attributes:
sa1: “sa value”
Dimensions(lat,lon,time)
Datafield f1(lat,lon,time)
Geofield f2(lat,lon,time)
Swath s2
Dimensions(lat,lon,time)
Datafield f3(lat,lon,time)
Geofield f4(lat,lon.time)

netCDF
File Attributes:
fa1: “fa value”
s1_sa1: “sa value”
Dimensions(lat,lon,time,s2_time)
Variable s1_f1(lat,lon,time)
Variable s1_GEO_f2(lat,lon,time)
Variable s2_f3(lat,lon,s2_time)
Variable s2_GEO_f4(lat,lon,s2_time)

HDF-EOS 2/5 to netCDF Converter

  • 1.
    HDF-EOS 2/5 tonetCDF Converter Bob Bane, Richard Ullman, Jingli Yang Data Usability Group NASA/Goddard Space Flight Center
  • 2.
    Introduction • Status report •Properties of netCDF and HDF-EOS • Conversion strategy
  • 3.
    Status Report • Lastyear - hdfeos52netcdf – HDF-EOS 5 -> netCDF – COARDS compatible • Current – Uses he25 interoperability library, so does both HDF-EOS 2 and 5 – CF compatible
  • 4.
    Data Formats andConventions • Generic data containers – HDF, netCDF • Conventions for domain-specific metadata – HDF-EOS, COARDS/CF • HDF -> HDF-EOS • netCDF -> COARDS/CF
  • 5.
    netCDF • netCDF filescontain: – Variables • multi-dimensional arrays of basic data types (character/integer/float) – Dimensions • named sizes for dimensions of variables – Attributes • named one-dimensional arrays • properties of variables
  • 6.
    netCDF Conventions • Metadatais stored in attributes – Conventions for names: “units” • Coordinate vector – Variable with the same name as a dimension – Value is a vector of same size as the dimension – Is a mapping between (0,1,2…) dimension indexing and physical quantities for dimension
  • 7.
    COARDS Conventions • CooperativeOcean/Atmospheric Research Data Service – Conventions for use of netCDF • Order of dimensions for variables • Names of attributes (“Units”, “_FillValue”) • Coordinate variables – http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf _profile.html
  • 8.
    CF Conventions • Climateand Forecast – Follow-on to COARDS – Tighter • Many attributes optional in COARDS are required in CF – More capable • Multi-dimensional geolocation support – http://www.cgd.ucar.edu/cms/eaton/cfmetadata/
  • 9.
    HDF • Hierarchical DataFormat • HDF files contain: – Datasets • multi-dimensional arrays of basic data types – Dimensions • Named sizes of dataset dimensions – Groups • Named groups of datasets (and groups) – Attributes • Named properties of datasets and groups, similar to netCDF
  • 10.
    HDF-EOS • Conventions andAPI for HDF • HDF-EOS files contain: – Fields (datasets) – Points • Individually geolocated measurements – Swaths • Groups of data and geolocation fields, and mappings between them – Grids • Groups of data fields with rectilinear geolocation
  • 11.
    HDF-EOS (cont) • HDF-EOS2 over HDF4 • HDF-EOS 5 over HDF5 – HDF5 very different from HDF4 – HDF-EOS 2/5 near identical API – Our he25 library allows uniform access to HDF-EOS 2/5, so converter works for both • Looks/works like HDF-EOS 5 • On HDF-EOS 4 files, translates in/out
  • 12.
    Observations • HDF-EOS is“bigger” than netCDF – Additional structured metadata (ODL) – HDF-EOS API calls for geolocation • netCDF file ~= HDF-EOS Swath/Grid – Both are groups of related datasets
  • 13.
    Conversion Strategies • OneHDF-EOS file -> one netCDF file – Alternative is one Swath/Grid -> one file • COARDS/CF - if original HDF-EOS followed conventions, converted netCDF will also – Most HDF-EOS data producers are aware of COARDS/CF • Skip HDF-EOS Point datasets – Reconsider this if real world Point data emerges
  • 14.
    Conversion Strategies (cont) •Convert data to enable future processing – Geolocation data, attributes (units) – Other metadata less important • Could transfer ODL metadata as a string, but why? – Can always go back to the original file and use good HDF-EOS tools
  • 15.
    Conversion in General HDF-EOS Swaths1 Dimensions(lat,lon,time) Datafield f1(lat,lon,time) Geofield f2(lat,lon,time) Swath s2 Dimensions(lat,lon,time) Datafield f3(lat,lon,time) Geofield f4(lat,lon.time) netCDF Dimensions(lat,lon,time,s2_time) Variable s1_f1(lat,lon,time) Variable s1_GEO_f2(lat,lon,time) Variable s2_f3(lat,lon,s2_time) Variable s2_GEO_f4(lat,lon,s2_time) • Flatten HDF-EOS hierarchy • Encode names, types in variable names
  • 16.
    Swaths HDF-EOS Swath s2 Dimensions(lat, glat,lon, glon, time) DimensionMap(lat, glat, 0, 1) DimensionMap(lon, glon, 0, 1) Datafield f3(lat,lon,time) Geofield f4(glat,glon.time) netCDF Dimensions(lat,glat,lon,glon,time,s2_time) Attributes: s2_DimensionMap: “lat/glat, lon/glon” s2_DMOffsets: (0,0) s2_DMIncrements: (1,1); Variable s2_f3(lat,lon,s2_time) Attributes: coordinates: “s2_GEO_f3” Variable s2_GEO_f4(glat,glon,s2_time) • Swath name, geofield type encoded in variable names • Record dimension map in global attributes
  • 17.
    Grids HDF-EOS Grid g1 Dimensions(lat,lon,time) Corners(upleft, upright, lowleft,lowright) Datafield f1(lat,lon,time) netCDF Dimensions(lat,lon,time) Variable lat(lat) = (lowright,…upright) Variable lon(lon) = (lowleft, … upleft) Variable g1_f1(lat,lon,time) • Grid geolocation becomes coordinate variables
  • 18.
    Converter • C command-lineapplication – hdfeos2netcdf HDF_file netCDF_file • Should be portable to all HDFEOS5/netCDF platforms – Naturally uses all libraries
  • 19.
    Where is theSoftware? • http://hdfeos.gsfc.nasa.gov – ‘Tools’ category – System ‘hdfeos2netcdf’
  • 20.
    Big Picture HDF-EOS File Attributes fa1:“fa value” Swath s1 Attributes: sa1: “sa value” Dimensions(lat,lon,time) Datafield f1(lat,lon,time) Geofield f2(lat,lon,time) Swath s2 Dimensions(lat,lon,time) Datafield f3(lat,lon,time) Geofield f4(lat,lon.time) netCDF File Attributes: fa1: “fa value” s1_sa1: “sa value” Dimensions(lat,lon,time,s2_time) Variable s1_f1(lat,lon,time) Variable s1_GEO_f2(lat,lon,time) Variable s2_f3(lat,lon,s2_time) Variable s2_GEO_f4(lat,lon,s2_time)