SlideShare a Scribd company logo
1 of 18
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.

Towards Long-Term Archiving of
NASA HDF-EOS and HDF Data
Data Maps and the Use of Mark-Up Language

Ruth Duerr, Mike Folk, Muqun Yang, Chris Lynnes, Peter Cao
Outline
• Background
• Data Mapping Project Description
• Plans and Early Results

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Outline
• Background
• Data Mapping Project Description
• Plans and Early Results

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
A Concern
• The majority of the data from NASA’s
Earth Observing System (EOS) have
been archived in HDF Version 4 (HDF4)
or HDF-EOS 2 format.
• HDF files have a complex internal byte
layout, requiring one to use the API to
access HDF data
• Long-term readability of HDF data
depends on long-term allocation of
resources to support the API

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
A Proposal from the Workshop Last Year
• Chris Lynnes noted that
 What was needed was a map to the
contents of an HDF file
 The output of the HDF4 tools (e.g., hdfls,
hdp, etc.) already provide much of the
information needed
 Extending these tools to create a map to
the contents of the file might be feasible

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Outline
• Background
• Data Mapping Project Description
• Plans and Early Results

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Data Mapping Project Description
• Assess and categorize NASA holdings of
HDF4 data
• Investigate methods of mapping HDF4 files
• Develop requirements for tools to create
maps of HDF4 files
• Create a prototype tool to create maps
• Test the utility of these maps by developing 2
independent tools that use the maps to read
real data

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Data Mapping Project Description (continued)
• Assess the utility of this approach
• Document our findings
• Present results and options for
proceeding to the user community
• Evaluate the effort required for a full
solution that meets community needs
• Submit a proposal for that effort

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Outline
• Background
• Data Mapping Project Description
• Plans and Early Results

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Assess and Categorize NASA Holdings
•
•

•

NASA provided a starter
list of data sets held
NASA data centers were
requested to provide a list
at a project briefing
Results from each DAAC
being compared to ECHO
assessment of data sets
using a .hdf extension

While the volume of NASA data stored in HDF4/HDF-EOS2
format is measured in PB; the fraction of the total number of
NASA data sets archived in HDF4/ HDF-EOS2 is “small”
Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Assess and Categorize NASA Holdings (continued)
• Examples of each of the hdf4 data sets have
been obtained and examined*
• Information kept summarized below:
•
•
•
•
•

Product id/name
Data Center
Product Version
Multi-file product?
HDF/EOS info (if any)





HDF/EOS version
Point info
Swath info
Grid info

• HDF info







Version
Raster image info
Palette
SDS info
V data info
Annotation

* For the most part

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Assess and Categorize NASA Holdings (continued)
• Very preliminary findings
 Roughly 50/50 split between HDF-EOS
and plain HDF
 Point data is relatively rare and when found
is not accompanied by swath or grid data
 No indexes yet
 While a few products use the image types,
there are no palettes yet

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Investigate Methods of Mapping HDF4 Files
• NSIDC and GES-DISC have provided THG sample data files
• Preliminary priorities for capabilities to tackle:











Contiguous SDS
Contiguous SDS with unlimited dimension
Chunked SDS
Compressed SDS
Chunked and compressed SDS
SDS and attributes
Vdata and attributes
Annotation
Vgroup
Raster image and attributes

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Investigate Methods of Mapping HDF4 Files
• NSIDC and GES-DISC have provided THG sample data files
• Preliminary priorities for capabilities to tackle:











Contiguous SDS
Contiguous SDS with unlimited dimension
Chunked SDS
Compressed SDS
Chunked and compressed SDS
SDS and attributes
Vdata and attributes
Annotation
Vgroup
Raster image and attributes

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Develop Requirements for Tools to Create Maps
• Maps will be XML-based
• A draft of a map format specification
has been started

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Create a Prototype Tool to Create Maps
• An iterative process is being used to
create the prototype
• Each iteration adds the next capability
from the prioritized list shown earlier
• At this point, the tool just creates a text
description

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Communications Plan
• Bi-weekly telecons with our sponsors (may move to
monthly)
• Briefing to NASA Data Center managers held, expect
to provide periodic updates
• Brief community at the HDF-Workshop and other
relevant meetings (e.g., AGU)
• Submit a paper to the special issue of IEEE
Transactions of Geoscience and Remote Sensing
devoted to Data Archiving and Distribution
• Public wiki established but not yet populated

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Summary
• We’ve started a project to assess and
prototype the ability to create maps to
the contents of HDF4 files that allow
programmers to develop code to read
data without using the HDF APIs
• We welcome community involvement

Presented at the HDF and HDF-EOS

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.

More Related Content

More from The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

More from The HDF-EOS Tools and Information Center (20)

Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Towards Long-Term Archiving of NASA HDF-EOS and HDF Data - Data Maps and the Use of Mark-Up Language

  • 1. QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Towards Long-Term Archiving of NASA HDF-EOS and HDF Data Data Maps and the Use of Mark-Up Language Ruth Duerr, Mike Folk, Muqun Yang, Chris Lynnes, Peter Cao
  • 2. Outline • Background • Data Mapping Project Description • Plans and Early Results Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 3. Outline • Background • Data Mapping Project Description • Plans and Early Results Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 4. A Concern • The majority of the data from NASA’s Earth Observing System (EOS) have been archived in HDF Version 4 (HDF4) or HDF-EOS 2 format. • HDF files have a complex internal byte layout, requiring one to use the API to access HDF data • Long-term readability of HDF data depends on long-term allocation of resources to support the API Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 5. A Proposal from the Workshop Last Year • Chris Lynnes noted that  What was needed was a map to the contents of an HDF file  The output of the HDF4 tools (e.g., hdfls, hdp, etc.) already provide much of the information needed  Extending these tools to create a map to the contents of the file might be feasible Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 6. Outline • Background • Data Mapping Project Description • Plans and Early Results Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 7. Data Mapping Project Description • Assess and categorize NASA holdings of HDF4 data • Investigate methods of mapping HDF4 files • Develop requirements for tools to create maps of HDF4 files • Create a prototype tool to create maps • Test the utility of these maps by developing 2 independent tools that use the maps to read real data Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 8. Data Mapping Project Description (continued) • Assess the utility of this approach • Document our findings • Present results and options for proceeding to the user community • Evaluate the effort required for a full solution that meets community needs • Submit a proposal for that effort Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 9. Outline • Background • Data Mapping Project Description • Plans and Early Results Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 10. Assess and Categorize NASA Holdings • • • NASA provided a starter list of data sets held NASA data centers were requested to provide a list at a project briefing Results from each DAAC being compared to ECHO assessment of data sets using a .hdf extension While the volume of NASA data stored in HDF4/HDF-EOS2 format is measured in PB; the fraction of the total number of NASA data sets archived in HDF4/ HDF-EOS2 is “small” Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 11. Assess and Categorize NASA Holdings (continued) • Examples of each of the hdf4 data sets have been obtained and examined* • Information kept summarized below: • • • • • Product id/name Data Center Product Version Multi-file product? HDF/EOS info (if any)     HDF/EOS version Point info Swath info Grid info • HDF info       Version Raster image info Palette SDS info V data info Annotation * For the most part Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 12. Assess and Categorize NASA Holdings (continued) • Very preliminary findings  Roughly 50/50 split between HDF-EOS and plain HDF  Point data is relatively rare and when found is not accompanied by swath or grid data  No indexes yet  While a few products use the image types, there are no palettes yet Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 13. Investigate Methods of Mapping HDF4 Files • NSIDC and GES-DISC have provided THG sample data files • Preliminary priorities for capabilities to tackle:           Contiguous SDS Contiguous SDS with unlimited dimension Chunked SDS Compressed SDS Chunked and compressed SDS SDS and attributes Vdata and attributes Annotation Vgroup Raster image and attributes Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 14. Investigate Methods of Mapping HDF4 Files • NSIDC and GES-DISC have provided THG sample data files • Preliminary priorities for capabilities to tackle:           Contiguous SDS Contiguous SDS with unlimited dimension Chunked SDS Compressed SDS Chunked and compressed SDS SDS and attributes Vdata and attributes Annotation Vgroup Raster image and attributes Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 15. Develop Requirements for Tools to Create Maps • Maps will be XML-based • A draft of a map format specification has been started Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 16. Create a Prototype Tool to Create Maps • An iterative process is being used to create the prototype • Each iteration adds the next capability from the prioritized list shown earlier • At this point, the tool just creates a text description Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 17. Communications Plan • Bi-weekly telecons with our sponsors (may move to monthly) • Briefing to NASA Data Center managers held, expect to provide periodic updates • Brief community at the HDF-Workshop and other relevant meetings (e.g., AGU) • Submit a paper to the special issue of IEEE Transactions of Geoscience and Remote Sensing devoted to Data Archiving and Distribution • Public wiki established but not yet populated Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 18. Summary • We’ve started a project to assess and prototype the ability to create maps to the contents of HDF4 files that allow programmers to develop code to read data without using the HDF APIs • We welcome community involvement Presented at the HDF and HDF-EOS QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Editor's Notes

  1. Collaborative project participants Ruth Data manager at NSIDC and Data Stewardship program lead Folks at THG Chris Lynnes at the GES-DISC
  2. This project was born out of concerns for the long-term accessibility of HDF4 and HDF-EOS2 data that folks from the HDF Group and I have had for many years. One of the options for dealing with this is to do what Rich Ullman suggested yesterday, that is to retire the format and migrate all of the data to HDF5/HDF-EOS5. This is likely to be rather expensive and worse yet some day when the next new format comes along you will have to do the whole migration over again, potentially every few years for the rest of time.
  3. Mike Folk and I thought that was a great idea, discussed the concept with Chris at the meeting, and decided to see if NASA would be interested in supporting a pilot study. They were willing, so thus was born the HDF mapping project which started up just this last August.
  4. The HDF4 APIs, like the HDF5 APIs we’ve been hearing about all week, are quite complex and it is not at all clear that producing a map to the data will work in all cases. The purpose of assessing and categorizing NASA’s holdings of HDF4 products, is to determine which sets of capabilities have actually been used and the frequency of use of each. The idea is to guide work on this project towards implementing capabilities with high impact. I’m a fan of the Pareto principle. If 20% of the effort will solve 80% of the problem, then perhaps that’s the effort that should be recommended. As a byproduct of this step, NASA will gain a catalog of their HDF4 data holdings and information about the implementation of each. Investigating methods of mapping HDF4 files is primarily an THG task. Peter Cao is taking the lead on this. We’ll talk about his status later in this presentation. Developing requirements for tools to create these maps is a joint responsibility. I also think that this is an area where the user community, people like you all out there in the audience, could have input. THG will also be responsible for creating a prototype tool to create maps from real data. Towards that end both NSIDC and GES-DISC have provided them with sample data. NSIDC and GES-DISC will undertake separate implementations of read software that will take a map and use it to read real data files. Towards that end, I’ve hired a student, a naïve user if you will, to both work on the assessment as well as to implement the read software. GES-DISC will assign one of their employees to do likewise. These will be independent implementations - very likely in different languages and using different data.
  5. The previous steps collectively should provide us a very good idea of how feasible and useful this idea is. We intend to document the results of each step in this process. For example, we will document the results of the assessment and categorization of the data and NASA will be provided the catalog of NASA HDF4 data holdings that is developed. We will also document the requirements developed and the results of the independent implementation tests. Our intention is that this would be an open process, that communication with our stakeholder community is important, that community input on things like requirements and options for proceeding is important. My presentation here is a part of our plans in this area. As one of the last steps in this project, we will attempt to provide an evaluation of the effort needed to do a full-up solution. We haven’t really started talking about exactly what that will consist of yet; but, my own preference would be to include information about what it would require for each of the NASA data centers to actually implement a full-up solution. And then, assuming that the results of the project warrant it, we will submit a proposal to do that full-up effort.
  6. This step turns out to be a bit more difficult than you might expect, simply because NASA does not have an up-to-date, definitive list of all of the data sets that are archived by its data centers. NASA did provide a starter list of data sets from the EDGRS metrics gathering system. This list indicates what data sets are (or were in some cases) held by which data centers. At the DAAC management briefing each participant was given a list of data that theoretically were held by them and asked to indicate which, if any, were in HDF4 format. All of the attendees provided their list within a few days. Email was sent to the other centers - so far unsuccessfully. Will shortly be in the process of following up with those folks. As a crude sanity check of the results, we obtained a list from NASA’s ECHO system of all of the data sets that use the .hdf extension.
  7. One thing I should note is that where ever you see the word “info” that indicates that there are several items being kept under that general category. For example, under swath info we are keeping track of how many swaths there are in the file, how many dimensions the swaths have, how they are organized (for example by time, space, both), and whether dimension maps are used. For SDS, we keep track of how many SDS’s there are, what the maximum dimensionality of an SDS is, whether there are attributes or annotation, whether dimension scales are used, whether chunking is used, and what kind (if any) of compression is used. In other words, we keep track of which portions of the APIs are being used.
  8. I seriously debated whether to show any results of our findings at this point, since we haven’t gathered all of the data and the sample that we have examined is in no way statistically representative; but, decided that I should at least give some idea of what kind of information will come out of this study. The point is that we should have enough information to determine what constructs are used most frequently and in what combinations.
  9. These priorities were generated before we really had any data. They were based on our combined gut feel for what was out there. We now have some data and probably should revisit the list.
  10. The green indicates roughly how far down the list Peter has had a chance to test his ability to develop a map
  11. I expect we will use some combination of the THG and NASA EOS email lists to let folks know when there are new materials on the wiki that they might be interested in.