Rots RDAP11 Data Archives in Federal Agencies


Published on

Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rots RDAP11 Data Archives in Federal Agencies

  1. 1. Preservation of Astronomical Data Arnold Rots Smithsonian Astrophysical Observatory Virtual Astronomical Observatory
  2. 2. Context <ul><li>I am Archive Astrophysicist for the Chandra Data Archive </li></ul><ul><ul><li>Chandra is an X-ray telescope, one of NASA’s great observatories </li></ul></ul><ul><li>The CDA is operated by the Smithsonian Astrophysical Observatory under contract with NASA </li></ul><ul><ul><li>Here are right away two separate federal masters </li></ul></ul><ul><ul><li>As such it is one of the NASA astrophysics data centers </li></ul></ul><ul><li>But I am also the lead for Data Curation and Preservation for the Virtual Astronomical Observatory </li></ul><ul><ul><li>The CDA is compliant with VAO standards </li></ul></ul><ul><li>The VAO is a member of the International Virtual Observatory Alliance </li></ul>
  3. 3. The Astronomical Data Universe <ul><li>International Virtual Observatory Alliance </li></ul><ul><li>Virtual Astronomical Observatory (USA) </li></ul><ul><li>Chandra Data Archive </li></ul>
  4. 4. The Other Data Universe <ul><li>Smithsonian Institution </li></ul><ul><li>Smithsonian Astrophysical Observatory </li></ul><ul><li>Chandra Data Archive </li></ul>
  5. 5. Our Complex Relations <ul><li>As an observatory data archive we have multiple federal masters: </li></ul><ul><ul><li>Smithsonian Institution </li></ul></ul><ul><ul><li>NASA </li></ul></ul><ul><ul><li>NSF (through VAO) </li></ul></ul><ul><li>And more non-federal masters: </li></ul><ul><ul><li>IVOA </li></ul></ul><ul><ul><ul><li>Interoperability standards and protocols </li></ul></ul></ul><ul><ul><li>US user community </li></ul></ul><ul><ul><li>International user community </li></ul></ul><ul><li>Bright points: </li></ul><ul><ul><li>No privacy issues </li></ul></ul><ul><ul><li>No national security issues </li></ul></ul><ul><ul><li>No commercial value </li></ul></ul>
  6. 6. The Smithsonian Side <ul><li>SI is going digital </li></ul><ul><li>There is a world of difference between collections and scientific research data </li></ul><ul><li>On the one hand our experience is valuable for the non-research units </li></ul><ul><li>On the other hand, there are legitimate differences in approach and requirements </li></ul>
  7. 7. Virtual Observatory <ul><li>Objective: </li></ul><ul><ul><li>Make all astronomical data seamlessly accessible </li></ul></ul><ul><ul><li>Provide analysis tools </li></ul></ul><ul><li>This requires: </li></ul><ul><ul><li>Interoperability standards: metadata and protocols </li></ul></ul><ul><ul><ul><ul><li>Ubiquity of FITS data file format helps </li></ul></ul></ul></ul><ul><ul><li>Development of general tools </li></ul></ul><ul><li>IVOA </li></ul><ul><ul><li>Standards authority </li></ul></ul><ul><ul><li>Collaborative consortium of national VO organizations </li></ul></ul><ul><li>VAO </li></ul><ul><ul><li>USA member of the IVOA </li></ul></ul><ul><ul><li>NSF & NASA-funded collaboration of nine institutions </li></ul></ul>
  8. 8. VAO: Virtual Astronomical Observatory <ul><li>Standards and protocols </li></ul><ul><ul><li>Collaborating in the IVOA framework </li></ul></ul><ul><li>Tools development </li></ul><ul><ul><li>Compliant with IVOA standards, US priorities, IVOA-coordinated </li></ul></ul><ul><li>User support </li></ul><ul><ul><li>Documentation, portal, … </li></ul></ul><ul><li>Operations </li></ul><ul><ul><li>Provides the necessary framework </li></ul></ul><ul><li>Technology assessment </li></ul><ul><ul><li>What technologies do we use/introduce? </li></ul></ul><ul><li>EPO </li></ul><ul><li>Data Curation and Preservation </li></ul>
  9. 9. VAO DC&P Components <ul><li>Mission/Observatory data archives </li></ul><ul><ul><li>NASA centers: more or less OAIS- and TDR-compliant </li></ul></ul><ul><ul><li>For CDA, we keep it all (all versions, multiple copies) on spinning disk </li></ul></ul><ul><li>Contributed datasets </li></ul><ul><ul><li>This is a problem area </li></ul></ul><ul><li>Bibliographic repository </li></ul><ul><ul><li>The Astrophysics Data System (ADS) has the entire astronomical literature (excepting books) online for the entire international community </li></ul></ul><ul><li>Semantic linking </li></ul><ul><ul><li>Linking datasets with datasets, papers with papers, datasets with papers; another problem area; Dataset Identifiers </li></ul></ul><ul><ul><li>Discovery tools </li></ul></ul>
  10. 10. Ontology/Semantic Linking <ul><li>Triplestore prototype discovery tool: </li></ul><ul><li> </li></ul><ul><li>Just a factoid from collecting bibliographic links: </li></ul><ul><ul><li>The amount of Chandra data published each year is the equivalent of 5-6 years of observations </li></ul></ul>
  11. 11. Challenges (and some solutions) <ul><li>Data Management Plans </li></ul><ul><ul><li>We have experience, we will provide guidance and support </li></ul></ul><ul><li>Establish public repositories </li></ul><ul><ul><li>If you want people to contribute datasets, you have to give them a place to put them; who is going to provide/run this? </li></ul></ul><ul><li>Funding </li></ul><ul><ul><li>DMPs should help here; beware of unfunded mandates </li></ul></ul><ul><li>Make distributed repositories interoperate and transparently look like one </li></ul><ul><ul><li>This is easier for the established mission/observatory archives than for repositories of contributed datasets </li></ul></ul><ul><li>Define metadata requirements </li></ul><ul><ul><li>Probably the most crucial challenge </li></ul></ul>
  12. 12. More Challenges <ul><li>Get users to contribute their datasets </li></ul><ul><ul><li>Highly processed products; data behind the plots and figures </li></ul></ul><ul><ul><li>One has to make this easy: special tools needed </li></ul></ul><ul><li>Get the users to provide adequate and correct metadata </li></ul><ul><ul><li>Even harder; but also even more important * </li></ul></ul><ul><li>Get users to provide links </li></ul><ul><ul><li>Equally hard; but essential for semantic data discovery </li></ul></ul><ul><li>These three need to be part of the manuscript submission process </li></ul><ul><ul><ul><li>* AVM standard for astronomical Jpegs is a step in the right direction </li></ul></ul></ul>
  13. 13. And More Challenges <ul><li>Get the community to release research data </li></ul><ul><ul><li>Not a problem for NASA-funded data </li></ul></ul><ul><ul><li>The NSF DMP requirement is a good step forward </li></ul></ul><ul><ul><li>The culture is changing, but private institutions are problematic </li></ul></ul><ul><li>OAIS compliance will probably come around naturally </li></ul><ul><li>Agency IT standards and requirements </li></ul><ul><li>Certification as a Trusted Digital Repository </li></ul><ul><ul><li>It is harder to convince people that this is a good thing </li></ul></ul><ul><ul><li>But it may come to be required at some point </li></ul></ul>
  14. 14. Repository Requirements <ul><li>A Trusted Digital Repository should provide these services: </li></ul><ul><ul><li>Storage </li></ul></ul><ul><ul><li>Standardized metadata </li></ul></ul><ul><ul><li>Access </li></ul></ul><ul><ul><li>Authorization </li></ul></ul><ul><ul><li>Provenance </li></ul></ul><ul><ul><li>Curation </li></ul></ul><ul><ul><li>Authentication </li></ul></ul><ul><ul><li>Policy enforcement </li></ul></ul>
  15. 15. Repository Requirements <ul><li>Preservation metadata need to cover: </li></ul><ul><ul><li>Authenticity </li></ul></ul><ul><ul><li>Original arrangement </li></ul></ul><ul><ul><li>Integrity </li></ul></ul><ul><ul><li>Chain of custody and history </li></ul></ul><ul><ul><li>Trustworthiness </li></ul></ul><ul><li>Data processing, archiving, and all else will cease at some point, but there are three areas that will never be finished: </li></ul><ul><ul><li>Preservation </li></ul></ul><ul><ul><li>User support </li></ul></ul><ul><ul><li>User interface development </li></ul></ul><ul><li>Preservation of software? </li></ul>