• Like

Archive Information Packages for NASA HDF-EOS Data

  • 66 views
Uploaded on

One of the guiding concepts of the Reference Model for an Open Archival Information System, commonly referred to as the OAIS Reference Model, is the concept of an Archive Information Package (AIP) …

One of the guiding concepts of the Reference Model for an Open Archival Information System, commonly referred to as the OAIS Reference Model, is the concept of an Archive Information Package (AIP) containing not just the data to be preserved for future access, but also the reference information needed to ensure that the data is understandable by its target audience and the preservation description information containing the lineage of the data and which ensures that an accurate, unaltered copy is retrieved at any point in the future. While creating AIPs is simple in principle it is not necessarily obvious that it will be as simple in practice. In this talk, the results of an experiment to develop AIPs for data in NASA'S Earth Observation System (EOS) Data and Information System (DIS) are reported.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
66
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Lots of background material that I won’t really discuss – indicated
  • Syntax
    - XFDU
    - DFDL
    - ESML
    Semantics?
  • A couple of interesting and useful things about METS:
    is that it is deliberately designed to handle objects at a wide variety of scales (single files, complex web sites)
    Rather than attempting to define descriptive and administrative metadata needs for all kinds of objects, they designed the standard to incorporate a variety of other standards (e.g., FGDC for geospatial metadata)
  • When you talk to a geoscientist or data scientist who deals with geospatial data – these are the standards they know and care about
    GCMD – because it is the oldest, is internationally accepted; NASA/NOAA/NSF require it for data set descriptions; because the Global Change Master Directory is the data equivalent of WorldCat
    FGDC – Content Standard for Digital Geospatial Metadata; derived from DIF; mandated for all federally funded data by Executive Order
    ISO 19115 – Most recent standard – replacing FGDC – adopted by NOAA and likely NASA
  • But more than just descriptive metadata is needed
    It is equally important to know what has happened to the data since it’s creation, to know it’s provenance
  • The PREMIS entity<->relationship diagram
    Representation - “the set of files needed for a complete and reasonable rendition of an Intellectual Entity”
    File
    Bitstream - “contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes”
    So how does this apply to science data?
  • Keeping track of events in the digital library world for a few years
    Noticed that they’ve come up with standards to deal with a wide variety of information types
    NOAA and USGS were to be the ultimate home of much of NASA’s EOS data
    THG with funding ultimately from National Archives and Records Administration had written a white paper defining an HDF-AIP using a digital library standard
    A standard called METS
  • Primary Schema Extension Schema
    National Digital Geospatial Archive - LOC NDIIP (National Digital Information Infrastructure and Preservation Program )
    Recommendation by Nancy Hoebelheinrich of Stanford
  • Different data sets are different - some data sets have 1 file per granule; others have many; some data sets have a browse for each granule; in others the mapping is 1 to many; many to 1, or many to many
  • In ISO 19115 parlance, a dataset is an “identifiable collection of data,” where a dataset may reside in a larger dataset, can be as small as a single feature, and could even be a single map or chart (see ISO 19115:2003(E) page 3). This is in contrast to a data series which is a “collection of datasets sharing the same product specification” where the phrase “product specification” is totally undefined.
    In NASA, NOAA, and NSF parlance a data set is the collection of all of the files for a particular project, from a particular instrument, etc. preferentially that are all of the same type.
    A data set is comprised of data files or data granules.In HDF parlance, a Science Data Set is the unit within a file that contains a particular data array.

Transcript

  • 1. Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander
  • 2. Outline • What is an Archival Information Package?  HDF-AIP • Standards? What Standards?  METS  DIF/FGDC/ISO 19115-2  PREMIS • Results • Next Steps Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 3. OAIS Reference Model1 Archive Information Package 1 Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-B-1, Blue Book, January 2002. Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 4. Archival Information Package Contents • Content Information  The data object to be preserved  Information that describes the data object o Typically interpreted as the syntax and semantics of the file structure • Preservation Description Information  Provenance – Origin or source of the data, any changes that have taken place since, and who has had custody of it  Fixity – the authentication mechanisms (with keys) needed to ensure that the data object has not been altered in an undocumented manner  Reference – identification mechanisms and values  Context – relation of the object to its environment Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 5. HDF-Archive Information Packages • The HDF group was funded to investigate and propose a design for a complete archival information package for HDF data files • The result was a METS metadata file to accompany the HDF data file http://www.hdfgroup.org/projects/hdf5_aip/hdf5_aip_wp.html Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 6. Metadata Standards - METS • Metadata Encoding and Transmission Standard • An initiative of the Digital Library Federation • Provides the means to convey the metadata necessary for  management of digital objects within a repository  exchange of objects between repositories (or between repositories and their users) • Designed to facilitate  shared development of information management tools/services  interoperable exchange of digital materials Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 7. METS - A very brief overview Describes the METS document itself Describes the editor e.g., creator orobject using some external standard Describes object creation, storage, e.g., MARC, FGDC, Dublin Core intellectual property rights, source info, provenance, etc. Provides an inventory of all of the e.g., PREMIS files that are part of the object described A physical or logical map of the organization of the materials described Allows specification of hyperlinks between parts of the map (mostly useful when preserving websites) Used to associate executable code with parts of the content Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 8. Metadata Standards - Descriptive Metadata Derived from • Discovery, Assess and Access Metadata  GCMD DIF  FGDC CSDGM  ISO 19115 Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 9. Metadata Standards - ISO 19115:2003 • The international equivalent of the FGDC standard • Most fields can be mapped or generated from FGDC metadata • The exception is the Dataset Topic Keywords • Allows for national profiles Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 10. Metadata Standards - ISO 19115:2003 Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 11. Is there a metadata standard for AIP information? Archive Information Package 1 Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-B-1, Blue Book, January 2002. Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 12. Preservation Metadata Implementation Strategies (PREMIS) • Provide a core preservation metadata set with broad applicability across the digital preservation community • Developed by an OCLC and RLG sponsored international working group  Representatives from libraries, museums, archives, government, and the private sector. • Maintained by the Library of Congress • Based on the OAIS reference model Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 13. PREMIS - Entity-Relationship Diagram Intellectual Entities Objects “an action that involves at least organization, or Rights “a“a coherent set of content person,one object or agent known to the of information software program associated “a discrete unitpreservation that is reasonably repository” with described as a unit” in preservation events in digital form” thee.g.,example,archived, For created, a data file life of a web site, For example, an object” data migrated or more e.g., Dr. Spockofof data it “assertions donated sets set or collection one rights or permissions pertaining to an object or an agent” e.g., copywrite notice, legal Events statute, deposit agreement Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII Agents
  • 14. Is there a metadata standard for AIP information? PREMIS ISO 19115 1 Reference Model for an Open Archival Information System (OAIS), CCSDS 650.0-B-1, Blue Book, January 2002. Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 15. NOAA Data Stewardship Prototype • NSIDC and THG demonstrated the feasibility of migrating NASA data to a standard HDF-AIP format • Motivation: Technologies change regularly, organizations come and go, but data must survive But preserving data takes more than just preserving the bits, all the components of an AIP are critical Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 16. Project Goals • Prototype development of Archive Information Packages for HDF data:  For entire data sets  For individual “granules” • Test usability of digital library standards with geospatial data Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 17. Program Plan (Modified) ISO-19115 CDM/NetCDF4 ECS to METS (Data Set) HDF5-AIP NetCDF4 / HDF5 Data METS NetCDF4/HDF5-data ECS to METS NSIDC/ECS Metadata (Granule) H4to H5 Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII NSIDC/ ECS HDF4-data
  • 18. HDF5 Granule Level Archive Information Packages Data file HDF5 Metadata file METS Primary Schema Extension Schema |<mets> |---<dmdSec>----------------<ISO 19115> |---<amdSec>--------------|--<techMD> | |--<rightsMD> | |--<sourceMD> |----<fileGrp> |----<structMap> PREMIS HDF5 AIP Components http://www.hdfgroup.uiuc.edu/papers/papers/AIP/HDF5_AIP_White_Paper.pdf Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 19. File Level AIP Activity Status • Developed a map from NSIDC/ECS metadata to METS/PREMIS/ISO 19115 components • Prototype software completed • Issues  What goes in PREMIS vs ISO 19115?  Auxillary file handling - own AIP or not? o E.g., browse files, processing history, PGE’s  Granules vs files Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 20. Issues and Questions • Inconsistent use of terminology between standards – for example, what is a data set? • Many of the standards care about distribution formats  Are these even relevant concepts any more?  Do you really want to have to update the metadata record just because a new distribution format was added?  What about new access services? Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 21. Next Steps • NSIDC is updating our non-ECS data systems handling of metadata including support for PREMIS, etc. metadata on all holdings • Work underway to upgrade granule level metadata for NSIDC flagship sea ice products (PREMIS/METS/ISO AIP packages) • Work to improve archivability of data stored in HDF formats on-going – NASA implementing a standard XML description of contents across its archives Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII
  • 22. Acknowledgement This work was supported under NOAA Scientific Stewardship Program grant number NA07OAR4310286. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of NOAA. Archival Information Packages for NASA HDF-EOS Data, presented 11/4/09 by R. Duerr HDF and HDF-EOS Workshop XIII