NASA HDF/HDF-EOS Data Access Challenges
 

NASA HDF/HDF-EOS Data Access Challenges

on

  • 407 views

This slide summarizes the common data access challenges that data users and producers may experience in NASA HDF/HDF-EOS data.

This slide summarizes the common data access challenges that data users and producers may experience in NASA HDF/HDF-EOS data.

Statistics

Views

Total Views
407
Views on SlideShare
239
Embed Views
168

Actions

Likes
0
Downloads
1
Comments
0

6 Embeds 168

http://hdfeos.org 73
http://localhost 50
http://hdfdap.hdfgroup.uiuc.edu 32
http://www.hdfeos.org 8
http://hdfdap 3
http://www.hdfeos.net 2

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hal Varian said, “…”
  • I think his message applies to Earth scientists as well. HDF is primary data format for distributing and archiving NASA data. So, I’d like to say “”.Scientists cannot wait for another decade.
  • Unfortunately, the answer is no from what I observed for last 6 years.I’ll show some email exchanges that prove this.
  • Although we provide numerous code examples through our web site, new users still ask questions like this.Some users have difficulty in understanding data.
  • Some users get errors during processing.
  • Some users have trouble in extracting some values at certain region and date.
  • Some users have trouble in visualizing data.
  • Some users want to convert the files in different format like GeoTIFF to share data with other software packages.
  • My observation is this.
  • In general, extracting geo-location information is the biggest challenge.Here’s one example.
  • In general, extracting geo-location information is the biggest challenge.Here’s one example.
  • Here’s another example for Hybrid case.
  • For some products, users should read data product manuals carefully.HDF is well known for self-describing data format but some products fail to deliver the advantage of HDF.
  • What is l3m_data?Global attribute “Sea Surface Temperature”.
  • Also aggregation from different satellite sources, that’s more challenging and interesting.Pipeline problem. Amazon EC2 accepts the entire disk shipped by FedEx. Kansas City – google fiber optics.Kent is in China and he gave up downloading data.

NASA HDF/HDF-EOS Data Access Challenges NASA HDF/HDF-EOS Data Access Challenges Presentation Transcript

  • The HDF Group NASA HDF/HDF-EOS Data Access Challenges H. Joe Lee (hyokee@hdfgroup.org) Kent Yang (myang6@hdfgroup.org) The HDF Group July 9, 2013 ESIP 2013 Summer Meeting 1 www.hdfgroup.org
  • Hal Varian, Google’s chief economist “The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades.” July 9, 2013 ESIP 2013 Summer Meeting 2 www.hdfgroup.org
  • For Earth Science Data Users The ability to take NASA HDF/HDF-EOS data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s a hugely important skill right now. July 9, 2013 ESIP 2013 Summer Meeting 3 www.hdfgroup.org
  • Is it easy to take NASA HDF data? No, for Average Joe data user. July 9, 2013 ESIP 2013 Summer Meeting 4 www.hdfgroup.org
  • Understand “I'm new to IDL and HDF; and I'm currently working MODIS L1B data. it possible to show with I found your examples very helpful. Is how radiance is calculated?” July 9, 2013 ESIP 2013 Summer Meeting 5 www.hdfgroup.org
  • Process “I work in NASA/GSFC GES-DISC on AIRS project. We have new idl version 8.1. But got a core dump error when we run EOS function EOS_SW_INQSWATH swath name from a AIRS level 2 product file. Need your help. Thanks.” to inqure July 9, 2013 ESIP 2013 Summer Meeting 6 www.hdfgroup.org
  • Extract Values TRMM “Hi,I want to use the following data, http://mirador.gsfc.nasa.gov/...2A25.... Can you provide me some programs that deal with these daily convective precipitation in the region 110-180E,040N during 2006?” datasets so that I can obtain the July 9, 2013 ESIP 2013 Summer Meeting 7 www.hdfgroup.org
  • Visualize matlab file for reading ozone hdf5 files obtained from mls available to the “Can you please make the public. I wanted to obtain ozone distribution over the world and ozone distributions with height etc. thank you :) …. oh can you tell me which function can i use to plot latitude in the x-axis, pressure in the y-axis and a contour plot of ozone over it?” July 9, 2013 ESIP 2013 Summer Meeting 8 www.hdfgroup.org
  • Communicate “Your prog is very helpful to verify my process. I have one more doubt. I am trying to convert this hdf to Geotiff using Matlab. Do have any written code to do the same. Doing it with HEG tool given an error specifying that 5D are only supported for SOM projections. Also I am doing all processing with Matlab. So could you pl. help me.” July 9, 2013 ESIP 2013 Summer Meeting 9 www.hdfgroup.org
  • NASA HDF Users See Challenges in accessing satellite-product-specific (MODIS, AIRS, MLS) geo-location/time-specific (lat/lon/height/year) their favorite software data with packages (MATLAB/IDL/ArcGIS). July 9, 2013 ESIP 2013 Summer Meeting 10 www.hdfgroup.org
  • What Makes Access Challenging? 1. Some files use the techniques that end users may not be familiar with, although the techniques may help storing data efficiently. 2. Information from a source outside the files is required to retrieve the data in a physically meaningful manner. 3. Attributes do not comply with the widely used conventions. 4. Metadata in HDF file has incorrect information. July 9, 2013 ESIP 2013 Summer Meeting 11 www.hdfgroup.org
  • Converted File Size Comparison 656M Netcdf-3 128M Netcdf-4 72M HDF-EOS2 July 9, 2013 ESIP 2013 Summer Meeting 9X 12 www.hdfgroup.org
  • Challenge 1: Unfamiliar Techniques Users look for Latitude/Longitude datasets that match variable (e.g., Ozone) datasets. Some HDF products have • mismatched lat/lon. • lat/lon information in metadata attribute. • duplicate lat/lon information. July 9, 2013 ESIP 2013 Summer Meeting 13 www.hdfgroup.org
  • Swath Dimension Map Example HDF-EOS Swath Dimension Map allows to have mismatched size in dimensions. • Latitude[512][512] • Longitude[512][512] • Data[1024][1024] July 9, 2013 ESIP 2013 Summer Meeting 14 www.hdfgroup.org
  • NSIDC AMSR_E NCL Example ; Read the file as HDF4 file to obtain dataset attributes. hdf4_file = addfile("AMSR_E_L3_WeeklyOcean_V03_20020616.hdf", "r") ; Read the file as HDF-EO2 file to obtain lat and lon. hdf-eos2_file = addfile("AMSR_E_L3_WeeklyOcean_V03_20020616.hdf.he2", "r User should call both HDF4 and HDF-EOS2 API: • HDF4 API alone cannot resolve lat/lon. • HDF-EOS2 API alone cannot retrieve some attributes that are added later by HDF4 APIs. July 9, 2013 ESIP 2013 Summer Meeting 15 www.hdfgroup.org
  • Challenge 2: Information Outside HDF Users must read data product manual to find • fill value / valid ranges • units or discrete key values • scale / offset equation • physical description of data Some products are not self-describing! July 9, 2013 ESIP 2013 Summer Meeting 16 www.hdfgroup.org
  • Without Information Outside HDF July 9, 2013 ESIP 2013 Summer Meeting 17 www.hdfgroup.org
  • With Information Outside HDF July 9, 2013 ESIP 2013 Summer Meeting 18 www.hdfgroup.org
  • Challenge 3: The CF Conventions Following the widely accepted CF conventions is important for interoperability but some HDF products • use non-alphanumeric characters. • use non-CF attribute names and values. • use non-CF scale / offset rules. • use different data type for attribute (e.g., _FillValue) from the variable. July 9, 2013 ESIP 2013 Summer Meeting 19 www.hdfgroup.org
  • Attribute Type Mismatch Example Int16 data[180][360] // Variable String valid_range “0,100” // Attribute (Wrong) Byte _FillValue 255 // Attribute (Wrong) Int16 data[180][360] // Variable Int16 valid_range 0,100 // Attribute (Correct) Int16 _FillValue 255 // Attribute (Correct) July 9, 2013 ESIP 2013 Summer Meeting 20 www.hdfgroup.org
  • Challenge 4: Incorrect Information Sometimes, metadata contains incorrect information. This is rare and such information is usually corrected immediately by data producers. July 9, 2013 ESIP 2013 Summer Meeting 21 www.hdfgroup.org
  • Incorrect Information Example An NCL user reported that the same code doesn’t work for an older MOP02 HDF-EOS5 file. In 2008/01/01 file, StructMetadata has the wrong value: nTime = 250841130416 In 2008/12/31 file, StructMetadata has the correct value: nTime= 2 LaRC ASDC fixed this already! July 9, 2013 ESIP 2013 Summer Meeting 22 www.hdfgroup.org
  • Good News The recent effort from The HDF Group overcomes many challenges: • HDF4/HDF5 OPeNDAP Handler with EnableCF option • H4CF Conversion Toolkit with NcML / NCO examples • HDF-EOS5 Augmentation Tool • HDF-EOS2 Dumper tool with Comprehensive Examples for MATLAB/IDL/NCL The above tools and their examples are available at HDFEOS.org. July 9, 2013 ESIP 2013 Summer Meeting 23 www.hdfgroup.org
  • Challenge 1: Unfamiliar Techniques HDF OPeNDAP handlers & H4CF Conversion Toolkit • provide full geo-location information as explicit datasets. HDF-EOS5 Augmentation Tool • provides ways to associate geo-location information with existing datasets or to supply new ones. HDF-EOS2 Dumper Tool • prints out geo-location information in ASCII because MATLAB/IDL/NCL can read ASCII text data. July 9, 2013 ESIP 2013 Summer Meeting 24 www.hdfgroup.org
  • Challenge 2: Information Outside HDF HDF OPeNDAP handlers • provide fill value / valid range information. • apply CF scale / offset rule. • calculate latitude and longitude values for some NASA non-EOS products. • are tested against ncml_handler so that data centers can add additional information using NcML. H4CF Conversion Toolkit (h4tonccf) • provides NcML and NCO examples to add or edit attributes for converted NetCDF files. July 9, 2013 ESIP 2013 Summer Meeting 25 www.hdfgroup.org
  • Challenge 3: The CF Conventions HDF OPeNDAP handlers & H4CF Conversion Toolkit • flatten group hierarchies. • change variable & attribute types, names, and values. • add named dimensions. • add coordinate information. July 9, 2013 ESIP 2013 Summer Meeting 26 www.hdfgroup.org
  • Challenge 4: Incorrect Information HDF OPeNDAP handlers & H4CF Conversion Toolkit • correct errors for old products temporarily. • catch errors for new products. July 9, 2013 ESIP 2013 Summer Meeting 27 www.hdfgroup.org
  • Better News We see less and less challenges in newer HDF products thanks to open communication and standardization effort among Earth Science communities through meetings, telecons, and mailing lists. • HDF – DAACs Telecons • ESDSWG – H5CF Conventions • ESIP • CF (satellite) conventions mailing lists July 9, 2013 ESIP 2013 Summer Meeting 28 www.hdfgroup.org
  • Future Challenges • Data Discovery • Subsetting and Aggregation • Sharing Research Data July 9, 2013 ESIP 2013 Summer Meeting 29 www.hdfgroup.org
  • Data Discovery Some users still don’t know how to search and where to download data. Spatial search in Reverb doesn’t guarantee that the matched HDF data files contain the valid values at the specific location that user is looking for. Browse image is helpful but users don’t want to examine one by one. July 9, 2013 ESIP 2013 Summer Meeting 30 www.hdfgroup.org
  • Reverb Browse Image for O3 at Seoul The returned HDF file has no value at Seoul July 9, 2013 ESIP 2013 Summer Meeting 31 www.hdfgroup.org
  • Subsetting and Aggregation Customized on-demand HDF product generation is desired based on the user’s query. For example, “Give me all L2 Ozone data at Seoul from 2002 to 2013 and allow me to download it as a single HDF file.” Most HDF data products are packaged in daily granule for large region. Search result returns thousands of HDF files and users cannot download them one by one. July 9, 2013 ESIP 2013 Summer Meeting 32 www.hdfgroup.org
  • Reverb Query Result for AIRS at Seoul Showing 1 to 9 of 5,047 granules July 9, 2013 ESIP 2013 Summer Meeting 33 www.hdfgroup.org
  • Sharing Research Data How can users easily compose and publish new research data from the different NASA data product sources? “I’d like to combine AIRS Ozone and OMI Ozone data at Seoul from 2002-2013 and share it with journal editors.” Can this be shared as a single URL query to NASA data cloud? July 9, 2013 ESIP 2013 Summer Meeting 34 www.hdfgroup.org
  • Thanks! Questions / Comments? eoshelp@hdfgroup.org July 9, 2013 ESIP 2013 Summer Meeting 35 www.hdfgroup.org
  • Acknowledgements This work was supported by Subcontract number 114820 under Raytheon Contract number NNG10HP02C, funded by the National Aeronautics and Space Administration (NASA) and by cooperative agreement number NNX08AO77A from the NASA. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Raytheon or the National Aeronautics and Space Administration. July 9, 2013 ESIP 2013 Summer Meeting 36 www.hdfgroup.org