Ensuring Long Term Access to
Remotely Sensed HDF4 Data
with Layout Maps
Ruth Duerr, NSIDC
Christopher Lynnes, GES DISC
The...
Background and basic
concept

Oct. 16 2008

HDF and HDF-EOS Workshop XII

2
I’m Plastic Man!

HDF4 is

EXTENSIBLE

FLEXIBLE
SELFDESCRIBING

Oct. 16 2008

HDF and HDF-EOS Workshop XII

3
But
There’s a cost…

Oct. 16 2008

HDF and HDF-EOS Workshop XII

4
Complexity!

Oct. 16 2008

HDF and HDF-EOS Workshop XII

5
Oct. 16 2008

HDF and HDF-EOS Workshop XII

6
Oct. 16 2008

HDF and HDF-EOS Workshop XII

7
Oct. 16 2008

HDF and HDF-EOS Workshop XII

8
Oct. 16 2008

HDF and HDF-EOS Workshop XII

9
Oct. 16 2008

HDF and HDF-EOS Workshop XII

10
Oct. 16 2008

HDF and HDF-EOS Workshop XII

11
Oct. 16 2008

HDF and HDF-EOS Workshop XII

12
How do we save HDF users
from having to deal with all of
the complexity under the
hood?

Oct. 16 2008

HDF and HDF-EOS Wor...
Through the HDF software
libraries, either by using the
HDF APIs directly or by using
HDF tools that depend on the
HDF lib...
• There is a risk in depending solely on the HDF
libraries to access HDF-formatted data over the
long term.
• It is possib...
Really smart people and software?
Maybe future
data users and
their computers
will be so smart
that the HDF4
format will b...
Maybe not.

Oct. 16 2008

HDF and HDF-EOS Workshop XII

17
We need an “easy” button

Oct. 16 2008

HDF and HDF-EOS Workshop XII

18
“If only we could read HDF data with an
independent program that does not rely on
the HDF API…
A possible approach [would ...
Oct. 16 2008

HDF and HDF-EOS Workshop XII

20
HDF4 file layout

Oct. 16 2008

HDF and HDF-EOS Workshop XII

21
HDF4 file layout

Oct. 16 2008

HDF and HDF-EOS Workshop XII

22
The project

Oct. 16 2008

HDF and HDF-EOS Workshop XII

23
HDF4 mapping
• Problem
− The complex internal byte layout of HDF files
requires one to use the API to access HDF data.
− T...
HDF4 mapping project activities
1. Assess and categorize HDF4 data held by NASA
− To determine what types of objects to ma...
Project activities (continued)
3. Assess results and plan next steps
− Present results and options for proceeding to the
c...
1. Assess and categorize

Oct. 16 2008

HDF and HDF-EOS Workshop XII

27
How many HDF4 products?
Data Center
ASF

HDF4 Products
0

GES-DISC
GHRC

54

ASDC

63

LP-DAAC

67

NSIDC

47

ORNL-DAAC

...
Data characteristics
Product Characteristics Examined
•

Product Identification
−
−
−
−

•
•

HDF-EOS version
For point da...
Other results
• Slightly more than half of the HDF4 products are in HDF-EOS 2
format
• Grids are the most common HDF-EOS d...
2. Prototype and proof of
concept

Oct. 16 2008

HDF and HDF-EOS Workshop XII

31
HDF4 mapping prototype workflow

HDF4 File
HDF4 File
“H4.hdf”
“H4.hdf”

hmap
hmap
linked with
linked with
HDF4 library
HDF...
Proof-of-concept results
• The HDF Group created prototype map
generation software and a draft map
specification
• Map gen...
Example map fragment
<?xml version="1.0" encoding="utf-8"?>
<hdf4:HDFMap xmlns:hdf4="http://www.hdfgroup.org/HDF4/HDF4Map"...
Next steps

Oct. 16 2008

HDF and HDF-EOS Workshop XII

35
Effort for full implementation
• Generate maps for existing archives
− GES-DISC approach: append the map XML to the XML
fi...
How you can help
• Consider what it might take to implement this for
your archive - contact Ruth if you’d like support
• R...
For more information
• Wiki page added to Confluence wiki
• Project page at The HDF Group website:
− http://www.hdfgroup.o...
Thank you.
This report is based upon work supported in part
by a Cooperative Agreement with the National
Aeronautics and S...
Upcoming SlideShare
Loading in...5
×

Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps

449

Published on

A preponderance of data from NASA's Earth Observing System (EOS) are archived in the HDF Version 4 (HDF4) format. The long-term preservation of these data is critical for climate and other scientific studies going many decades into the future.

Its rich structure, platform independence, and Application Programming Interfaces (API) and libraries make HDF4 very effective for working with the large and complex collection of EOS data products. Unfortunately, these features are achieved by employing a complex internal byte layout of HDF4 files, so future readability of HDF4 data depends on the preservation of the software that can interpret that layout. Having a way to access HDF4 data independent of a library could improve its viability as an archive format, and consequently give confidence that HDF4 data will be readily accessible forever, even if the HDF4 API and library are gone.

To address the need to simplify long-term access to EOS data stored in HDF4, a collaborative study between The HDF Group and NASAs Earth Science Data Centers investigated a new approach to accessing data in HDF4 files based on the creation of independent maps that describe the data in HDF4 files, and tools that can use these maps to recover data from those files. With this approach, relatively simple programs could extract the data from an HDF4 file, bypassing the need for the HDF4 library.

This report will describe the HDF4 mapping study, which included an assessment of the range of HDF4 formatted data held by NASA, development of a prototype HDF4 layout mapping language and format, and development of prototype tools to create layout maps and to read HDF4 data using layout maps. The report will also describe future plans to put the layout map approach into practice.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
449
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Full quote, from proposal:
    Through the HDF software libraries, either by using the HDF APIs directly or by using HDF tools that depend on the HDF libraries.
    However there is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term.
    It is possible, especially in the distant future, that the libraries may not be as readily available as they are today. To address this risk, it is desirable to have a way to retrieve the data independently.
    At the 10th HDF workshop, Christopher Lynnes of the Goddard Earth Sciences Data and Information Services Center(GES DISC) addressed this need: “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to] extend” hdfls to print a hierarchical map of a data file, [and] write ncdump/hdp-like utilities to find, assemble and write out SDSes and vdatas.”
    “Leveraging HDF Utilities,” Christopher Lynnes, 10th HDF Workshop. http://www.hdfeos.org/workshops/ws10/presentations/day3/Leveraging_HDF_Utilities.ppt.
  • An XML-based prototype schema for HDF4 mapping files (XML documents) was created. For a given binary HDF4 file, an associated mapping file contains structural and application metadata for the HDF4 file, as well as the locations of the object data (array element values) in the HDF4 file.
    A tool was written to generate mapping files.
    Other tools were developed that use the mapping files to read HDF4 files without calling the HDF4 library, confirming the approach is viable.
    While the focus of this effort was NASA EOSDIS data stored in HDF4 files, the general methodology is also relevant to other cases where the long-term accessibility of data stored in binary files is of concern.
    In addition, this work demonstrates how binary HDF files can be used to efficiently store large volumes of scientific data that is referenced by text-based XML documents (the mapping files).
  • Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps

    1. 1. Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps Ruth Duerr, NSIDC Christopher Lynnes, GES DISC The HDF Group Oct. 16 2008 HDF and HDF-EOS Workshop XII 1
    2. 2. Background and basic concept Oct. 16 2008 HDF and HDF-EOS Workshop XII 2
    3. 3. I’m Plastic Man! HDF4 is EXTENSIBLE FLEXIBLE SELFDESCRIBING Oct. 16 2008 HDF and HDF-EOS Workshop XII 3
    4. 4. But There’s a cost… Oct. 16 2008 HDF and HDF-EOS Workshop XII 4
    5. 5. Complexity! Oct. 16 2008 HDF and HDF-EOS Workshop XII 5
    6. 6. Oct. 16 2008 HDF and HDF-EOS Workshop XII 6
    7. 7. Oct. 16 2008 HDF and HDF-EOS Workshop XII 7
    8. 8. Oct. 16 2008 HDF and HDF-EOS Workshop XII 8
    9. 9. Oct. 16 2008 HDF and HDF-EOS Workshop XII 9
    10. 10. Oct. 16 2008 HDF and HDF-EOS Workshop XII 10
    11. 11. Oct. 16 2008 HDF and HDF-EOS Workshop XII 11
    12. 12. Oct. 16 2008 HDF and HDF-EOS Workshop XII 12
    13. 13. How do we save HDF users from having to deal with all of the complexity under the hood? Oct. 16 2008 HDF and HDF-EOS Workshop XII 13
    14. 14. Through the HDF software libraries, either by using the HDF APIs directly or by using HDF tools that depend on the HDF libraries. But what about the future… Oct. 16 2008 HDF and HDF-EOS Workshop XII 14
    15. 15. • There is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term. • It is possible, especially in the distant future, that the libraries may not be available. Oct. 16 2008 HDF and HDF-EOS Workshop XII 15
    16. 16. Really smart people and software? Maybe future data users and their computers will be so smart that the HDF4 format will be a piece of cake. Oct. 16 2008 HDF and HDF-EOS Workshop XII 16
    17. 17. Maybe not. Oct. 16 2008 HDF and HDF-EOS Workshop XII 17
    18. 18. We need an “easy” button Oct. 16 2008 HDF and HDF-EOS Workshop XII 18
    19. 19. “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to] extend hdfls to print a hierarchical map of a data file, [and] write ncdump/hdp-like utilities to find, assemble and write out SDSes and vdatas.” “Leveraging HDF Utilities” Christopher Lynnes HDF Workshop X. Oct. 16 2008 HDF and HDF-EOS Workshop XII 19
    20. 20. Oct. 16 2008 HDF and HDF-EOS Workshop XII 20
    21. 21. HDF4 file layout Oct. 16 2008 HDF and HDF-EOS Workshop XII 21
    22. 22. HDF4 file layout Oct. 16 2008 HDF and HDF-EOS Workshop XII 22
    23. 23. The project Oct. 16 2008 HDF and HDF-EOS Workshop XII 23
    24. 24. HDF4 mapping • Problem − The complex internal byte layout of HDF files requires one to use the API to access HDF data. − This makes long-term readability of HDF data dependent on long-term allocation of resources to support HDF software. • Proposed solution − Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. Oct. 16 2008 HDF and HDF-EOS Workshop XII 24
    25. 25. HDF4 mapping project activities 1. Assess and categorize HDF4 data held by NASA − To determine what types of objects to map. − To get an idea of the magnitude of the project. 1. Develop prototype for proof of concept − Develop markup-language based layout specification. − Develop tool to produce layout for an HDF4 file. − Develop and test two independent tools to read HDF4 data based solely on the map files. Oct. 16 2008 HDF and HDF-EOS Workshop XII 25
    26. 26. Project activities (continued) 3. Assess results and plan next steps − Present results and options for proceeding to the community. − Assess the likely usefulness of this approach, as well as any desirable modifications − Evaluate the effort required for a full solution that best meets community needs − Submit a proposal for the work needed to provide a full solution Oct. 16 2008 HDF and HDF-EOS Workshop XII 26
    27. 27. 1. Assess and categorize Oct. 16 2008 HDF and HDF-EOS Workshop XII 27
    28. 28. How many HDF4 products? Data Center ASF HDF4 Products 0 GES-DISC GHRC 54 ASDC 63 LP-DAAC 67 NSIDC 47 ORNL-DAAC 2 PO.DAAC 22 SDAC 0 MrDC 95 Total HDF and HDFEOS Workshop Oct. XII 16 2008 236 586 28
    29. 29. Data characteristics Product Characteristics Examined • Product Identification − − − − • • HDF-EOS version For point data • • − − • Number of swaths Maximum number of dimensions Organized by time, space, both, or other Whether dimension maps were used For gridded data • • • • Number of grids Max number of dimensions in a grid Number of projections used Whether any grids were indexed HDF Version − • Number of SDSs Maximum number of dimensions Did any SDS have attributes Was any SDS annotated Were dimension scales used Was compression used and if so what kind Was chunking used For Vdata − − − − − HDF and HDFEOS Workshop Oct. XII 16 2008 Number of 8-bit rasters Number of 24-bit rasters Number of general rasters Whether any rasters had attributes Whether any rasters were compressed Whether any rasters were chunked Whether there were any palettes For SDS data − − − − − − Number of point data sets Maximum number of levels For swath data • • • • For raster data − − − − − − − Product Name Data Level Archive Location Product Version Whether the product was multi-file For HDF-EOS products − − • • Number of Vdata structures Did any Vdata have attributes Did any Vdata fields have attributes Was compression used and if so what kind Was chunking used 29
    30. 30. Other results • Slightly more than half of the HDF4 products are in HDF-EOS 2 format • Grids are the most common HDF-EOS data structures in use • No products use a combination of grid, swath, and point data structures HDF and HDFEOS Workshop Oct. XII 16 2008 30
    31. 31. 2. Prototype and proof of concept Oct. 16 2008 HDF and HDF-EOS Workshop XII 31
    32. 32. HDF4 mapping prototype workflow HDF4 File HDF4 File “H4.hdf” “H4.hdf” hmap hmap linked with linked with HDF4 library HDF4 library HDF4 Mapping File HDF4 Mapping File (XML document) (XML document) “H4.hdf.map.xml” “H4.hdf.map.xml” Groups, Data Objects, Structural and Application Metadata; Locations of Object Data Object Data Reader 1 Reader 2 2 (C program) (Perl Script) (Perl Script) October 15-18, 2008 HDF and HDF-EOS Workshop XII 32
    33. 33. Proof-of-concept results • The HDF Group created prototype map generation software and a draft map specification • Map generator was tested on a wide variety of data products • GES-DISC and NSIDC independently wrote software that uses maps to read data files in NSIDC’s and GES-DISC’s archives • Summary - the concept is feasible! Oct. 16 2008 HDF and HDF-EOS Workshop XII 33
    34. 34. Example map fragment <?xml version="1.0" encoding="utf-8"?> <hdf4:HDFMap xmlns:hdf4="http://www.hdfgroup.org/HDF4/HDF4Map"> <hdf4:RootGroup> <hdf4:SDS objName="data1" objPath="/" objID="xid-DFTAG_NDG-2"> <hdf4:Attribute name="data range" ntDesc="32-bit signed integer"> 0 255 </hdf4:Attribute> <hdf4:Datatype dtypeClass="INT" dtypeSize="4" byteOrder="BE" /> <hdf4:Dataspace ndims="2"> 10 100 </hdf4:Dataspace> <hdf4:Datablock nblocks="1"> <hdf4:BlockOffset> 2502 </hdf4:BlockOffset> <hdf4:BlockNbytes> 4000 </hdf4:BlockNbytes> </hdf4:Datablock> </hdf4:SDS> </hdf4:RootGroup> </hdf4:HDFMap> Oct. 16 2008 HDF and HDF-EOS Workshop XII 34
    35. 35. Next steps Oct. 16 2008 HDF and HDF-EOS Workshop XII 35
    36. 36. Effort for full implementation • Generate maps for existing archives − GES-DISC approach: append the map XML to the XML files already kept for each file in their archive − NSIDC non-ECS data implementation: add an XML file for each data file in same directory − Other systems TBD • Generate maps for new data − Add map generation as a step in the ingest process using stand alone tool − Request product generation systems to use new API calls that generate maps • Develop production quality implementation of mapping tool, and possibly an API. • Possibly do similar assessment for HDF5 maps. HDF and HDFEOS Workshop Oct. XII 2008 16 36
    37. 37. How you can help • Consider what it might take to implement this for your archive - contact Ruth if you’d like support • Review the materials on the wiki and elsewhere comment heavily! HDF and HDFEOS Workshop Oct. XII 16 2008 37
    38. 38. For more information • Wiki page added to Confluence wiki • Project page at The HDF Group website: − http://www.hdfgroup.org/projects/hdf4mapping/ • Paper at 2008 fall AGU • Paper “Ensuring Long Term Access to Remotely Sensed Data with Layout Maps” in the upcoming TGRSS special issue on archiving and distribution HDF and HDFEOS Workshop Oct. XII 2008 16 38
    39. 39. Thank you. This report is based upon work supported in part by a Cooperative Agreement with the National Aeronautics and Space Administration (NASA) under NASA Award NNX06AC83A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration. Oct. 16 2008 HDF and HDF-EOS Workshop XII 39
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×