New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
NASA HDF/HDF-EOS Data Access Challenges
1. The HDF Group
NASA HDF/HDF-EOS Data
Access Challenges
H. Joe Lee (hyokee@hdfgroup.org)
Kent Yang (myang6@hdfgroup.org)
The HDF Group
July 9, 2013
ESIP 2013 Summer Meeting
1
www.hdfgroup.org
2. Hal Varian, Google’s chief economist
“The ability to take data – to
be able to understand it, to
process it, to extract
value from it, to visualize
it, to communicate it –
that’s going to be a hugely
important skill in the next
decades.”
July 9, 2013
ESIP 2013 Summer Meeting
2
www.hdfgroup.org
3. For Earth Science Data Users
The ability to take NASA HDF/HDF-EOS data – to
be able to understand it, to process it, to
extract value from it, to visualize it, to
communicate it – that’s a hugely important
skill right now.
July 9, 2013
ESIP 2013 Summer Meeting
3
www.hdfgroup.org
4. Is it easy to take NASA HDF data?
No, for Average Joe data user.
July 9, 2013
ESIP 2013 Summer Meeting
4
www.hdfgroup.org
5. Understand
“I'm new to IDL and HDF; and I'm currently working
MODIS L1B data.
it possible to show
with
I found your examples very helpful. Is
how radiance is
calculated?”
July 9, 2013
ESIP 2013 Summer Meeting
5
www.hdfgroup.org
6. Process
“I work in NASA/GSFC GES-DISC on AIRS project. We have
new
idl version 8.1. But got a core dump
error when we run EOS function EOS_SW_INQSWATH
swath name from a AIRS level 2
product file. Need your help. Thanks.”
to inqure
July 9, 2013
ESIP 2013 Summer Meeting
6
www.hdfgroup.org
7. Extract Values
TRMM
“Hi,I want to use the following
data,
http://mirador.gsfc.nasa.gov/...2A25....
Can you provide me some programs that deal with these
daily convective
precipitation in the region 110-180E,040N during 2006?”
datasets so that I can obtain the
July 9, 2013
ESIP 2013 Summer Meeting
7
www.hdfgroup.org
8. Visualize
matlab file for reading
ozone hdf5 files obtained from mls available to the
“Can you please make the
public. I wanted to obtain ozone distribution
over the world and ozone distributions with height etc. thank
you :)
….
oh can you tell me which function can i use to plot latitude in
the x-axis, pressure in the y-axis and a
contour plot
of ozone over it?”
July 9, 2013
ESIP 2013 Summer Meeting
8
www.hdfgroup.org
9. Communicate
“Your prog is very helpful to verify my process. I have one more
doubt. I am trying to
convert this hdf to
Geotiff using Matlab. Do have any written code to do
the same. Doing
it with HEG tool given an
error specifying that 5D are only supported for SOM
projections. Also I am doing all processing with Matlab. So
could you pl. help me.”
July 9, 2013
ESIP 2013 Summer Meeting
9
www.hdfgroup.org
10. NASA HDF Users See Challenges
in accessing
satellite-product-specific
(MODIS, AIRS, MLS)
geo-location/time-specific
(lat/lon/height/year)
their favorite software
data with
packages (MATLAB/IDL/ArcGIS).
July 9, 2013
ESIP 2013 Summer Meeting
10
www.hdfgroup.org
11. What Makes Access Challenging?
1. Some files use the techniques that end users may
not be familiar with, although the techniques may
help storing data efficiently.
2. Information from a source outside the files is
required to retrieve the data in a physically
meaningful manner.
3. Attributes do not comply with the widely used
conventions.
4. Metadata in HDF file has incorrect information.
July 9, 2013
ESIP 2013 Summer Meeting
11
www.hdfgroup.org
13. Challenge 1: Unfamiliar Techniques
Users look for Latitude/Longitude datasets that match
variable (e.g., Ozone) datasets.
Some HDF products have
• mismatched lat/lon.
• lat/lon information in metadata attribute.
• duplicate lat/lon information.
July 9, 2013
ESIP 2013 Summer Meeting
13
www.hdfgroup.org
14. Swath Dimension Map Example
HDF-EOS Swath Dimension Map allows to have
mismatched size in dimensions.
• Latitude[512][512]
• Longitude[512][512]
• Data[1024][1024]
July 9, 2013
ESIP 2013 Summer Meeting
14
www.hdfgroup.org
15. NSIDC AMSR_E NCL Example
; Read the file as HDF4 file to obtain dataset attributes.
hdf4_file = addfile("AMSR_E_L3_WeeklyOcean_V03_20020616.hdf", "r")
; Read the file as HDF-EO2 file to obtain lat and lon.
hdf-eos2_file = addfile("AMSR_E_L3_WeeklyOcean_V03_20020616.hdf.he2", "r
User should call both HDF4 and HDF-EOS2 API:
• HDF4 API alone cannot resolve lat/lon.
• HDF-EOS2 API alone cannot retrieve some attributes
that are added later by HDF4 APIs.
July 9, 2013
ESIP 2013 Summer Meeting
15
www.hdfgroup.org
16. Challenge 2: Information Outside HDF
Users must read data product manual to find
• fill value / valid ranges
• units or discrete key values
• scale / offset equation
• physical description of data
Some products are not self-describing!
July 9, 2013
ESIP 2013 Summer Meeting
16
www.hdfgroup.org
19. Challenge 3: The CF Conventions
Following the widely accepted CF conventions is
important for interoperability but some HDF products
• use non-alphanumeric characters.
• use non-CF attribute names and values.
• use non-CF scale / offset rules.
• use different data type for attribute (e.g.,
_FillValue) from the variable.
July 9, 2013
ESIP 2013 Summer Meeting
19
www.hdfgroup.org
21. Challenge 4: Incorrect Information
Sometimes, metadata contains incorrect information.
This is rare and such information is usually corrected
immediately by data producers.
July 9, 2013
ESIP 2013 Summer Meeting
21
www.hdfgroup.org
22. Incorrect Information Example
An NCL user reported that the same code doesn’t work
for an older MOP02 HDF-EOS5 file.
In 2008/01/01 file, StructMetadata has the wrong value:
nTime = 250841130416
In 2008/12/31 file, StructMetadata has the correct value:
nTime= 2
LaRC ASDC fixed this already!
July 9, 2013
ESIP 2013 Summer Meeting
22
www.hdfgroup.org
23. Good News
The recent effort from The HDF Group overcomes many
challenges:
• HDF4/HDF5 OPeNDAP Handler with EnableCF option
• H4CF Conversion Toolkit with NcML / NCO examples
• HDF-EOS5 Augmentation Tool
• HDF-EOS2 Dumper tool with Comprehensive
Examples for MATLAB/IDL/NCL
The above tools and their examples are available at
HDFEOS.org.
July 9, 2013
ESIP 2013 Summer Meeting
23
www.hdfgroup.org
24. Challenge 1: Unfamiliar Techniques
HDF OPeNDAP handlers & H4CF Conversion Toolkit
•
provide full geo-location information as explicit datasets.
HDF-EOS5 Augmentation Tool
•
provides ways to associate geo-location information with
existing datasets or to supply new ones.
HDF-EOS2 Dumper Tool
• prints out geo-location information in ASCII because
MATLAB/IDL/NCL can read ASCII text data.
July 9, 2013
ESIP 2013 Summer Meeting
24
www.hdfgroup.org
25. Challenge 2: Information Outside HDF
HDF OPeNDAP handlers
•
provide fill value / valid range information.
•
apply CF scale / offset rule.
•
calculate latitude and longitude values for some NASA
non-EOS products.
•
are tested against ncml_handler so that data centers
can add additional information using NcML.
H4CF Conversion Toolkit (h4tonccf)
• provides NcML and NCO examples to add or edit
attributes for converted NetCDF files.
July 9, 2013
ESIP 2013 Summer Meeting
25
www.hdfgroup.org
26. Challenge 3: The CF Conventions
HDF OPeNDAP handlers & H4CF Conversion Toolkit
•
flatten group hierarchies.
•
change variable & attribute types, names, and values.
•
add named dimensions.
•
add coordinate information.
July 9, 2013
ESIP 2013 Summer Meeting
26
www.hdfgroup.org
27. Challenge 4: Incorrect Information
HDF OPeNDAP handlers & H4CF Conversion Toolkit
•
correct errors for old products temporarily.
•
catch errors for new products.
July 9, 2013
ESIP 2013 Summer Meeting
27
www.hdfgroup.org
28. Better News
We see less and less challenges in newer HDF products
thanks to open communication and standardization effort
among Earth Science communities through
meetings, telecons, and mailing lists.
• HDF – DAACs Telecons
• ESDSWG – H5CF Conventions
• ESIP
• CF (satellite) conventions mailing lists
July 9, 2013
ESIP 2013 Summer Meeting
28
www.hdfgroup.org
29. Future Challenges
• Data Discovery
• Subsetting and Aggregation
• Sharing Research Data
July 9, 2013
ESIP 2013 Summer Meeting
29
www.hdfgroup.org
30. Data Discovery
Some users still don’t know how to search and where
to download data.
Spatial search in Reverb doesn’t guarantee that the
matched HDF data files contain the valid values at
the specific location that user is looking for.
Browse image is helpful but users don’t want to
examine one by one.
July 9, 2013
ESIP 2013 Summer Meeting
30
www.hdfgroup.org
31. Reverb Browse Image for O3 at Seoul
The returned HDF file
has no value at Seoul
July 9, 2013
ESIP 2013 Summer Meeting
31
www.hdfgroup.org
32. Subsetting and Aggregation
Customized on-demand HDF product generation is
desired based on the user’s query. For example,
“Give me all L2 Ozone data at Seoul from 2002 to 2013
and allow me to download it as a single HDF file.”
Most HDF data products are packaged in daily granule
for large region. Search result returns thousands of HDF
files and users cannot download them one by one.
July 9, 2013
ESIP 2013 Summer Meeting
32
www.hdfgroup.org
33. Reverb Query Result for AIRS at Seoul
Showing 1 to 9 of
5,047 granules
July 9, 2013
ESIP 2013 Summer Meeting
33
www.hdfgroup.org
34. Sharing Research Data
How can users easily compose and publish new
research data from the different NASA data product
sources?
“I’d like to combine AIRS Ozone and OMI Ozone data
at Seoul from 2002-2013 and share it with journal
editors.”
Can this be shared as a single URL query to NASA
data cloud?
July 9, 2013
ESIP 2013 Summer Meeting
34
www.hdfgroup.org
36. Acknowledgements
This work was supported by Subcontract number
114820 under Raytheon Contract number
NNG10HP02C, funded by the National Aeronautics
and Space Administration (NASA) and by
cooperative agreement number NNX08AO77A from
the NASA. Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the authors and do not necessarily reflect
the views of Raytheon or the National Aeronautics
and Space Administration.
July 9, 2013
ESIP 2013 Summer Meeting
36
www.hdfgroup.org
Editor's Notes
Hal Varian said, “…”
I think his message applies to Earth scientists as well. HDF is primary data format for distributing and archiving NASA data. So, I’d like to say “”.Scientists cannot wait for another decade.
Unfortunately, the answer is no from what I observed for last 6 years.I’ll show some email exchanges that prove this.
Although we provide numerous code examples through our web site, new users still ask questions like this.Some users have difficulty in understanding data.
Some users get errors during processing.
Some users have trouble in extracting some values at certain region and date.
Some users have trouble in visualizing data.
Some users want to convert the files in different format like GeoTIFF to share data with other software packages.
My observation is this.
In general, extracting geo-location information is the biggest challenge.Here’s one example.
In general, extracting geo-location information is the biggest challenge.Here’s one example.
Here’s another example for Hybrid case.
For some products, users should read data product manuals carefully.HDF is well known for self-describing data format but some products fail to deliver the advantage of HDF.
What is l3m_data?Global attribute “Sea Surface Temperature”.
Also aggregation from different satellite sources, that’s more challenging and interesting.Pipeline problem. Amazon EC2 accepts the entire disk shipped by FedEx. Kansas City – google fiber optics.Kent is in China and he gave up downloading data.