Your SlideShare is downloading. ×
Rots RDAP11 Data Archives in Federal Agencies
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Rots RDAP11 Data Archives in Federal Agencies


Published on

Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit …

Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Preservation of Astronomical Data Arnold Rots Smithsonian Astrophysical Observatory Virtual Astronomical Observatory
  • 2. Context
    • I am Archive Astrophysicist for the Chandra Data Archive
      • Chandra is an X-ray telescope, one of NASA’s great observatories
    • The CDA is operated by the Smithsonian Astrophysical Observatory under contract with NASA
      • Here are right away two separate federal masters
      • As such it is one of the NASA astrophysics data centers
    • But I am also the lead for Data Curation and Preservation for the Virtual Astronomical Observatory
      • The CDA is compliant with VAO standards
    • The VAO is a member of the International Virtual Observatory Alliance
  • 3. The Astronomical Data Universe
    • International Virtual Observatory Alliance
    • Virtual Astronomical Observatory (USA)
    • Chandra Data Archive
  • 4. The Other Data Universe
    • Smithsonian Institution
    • Smithsonian Astrophysical Observatory
    • Chandra Data Archive
  • 5. Our Complex Relations
    • As an observatory data archive we have multiple federal masters:
      • Smithsonian Institution
      • NASA
      • NSF (through VAO)
    • And more non-federal masters:
      • IVOA
        • Interoperability standards and protocols
      • US user community
      • International user community
    • Bright points:
      • No privacy issues
      • No national security issues
      • No commercial value
  • 6. The Smithsonian Side
    • SI is going digital
    • There is a world of difference between collections and scientific research data
    • On the one hand our experience is valuable for the non-research units
    • On the other hand, there are legitimate differences in approach and requirements
  • 7. Virtual Observatory
    • Objective:
      • Make all astronomical data seamlessly accessible
      • Provide analysis tools
    • This requires:
      • Interoperability standards: metadata and protocols
          • Ubiquity of FITS data file format helps
      • Development of general tools
    • IVOA
      • Standards authority
      • Collaborative consortium of national VO organizations
    • VAO
      • USA member of the IVOA
      • NSF & NASA-funded collaboration of nine institutions
  • 8. VAO: Virtual Astronomical Observatory
    • Standards and protocols
      • Collaborating in the IVOA framework
    • Tools development
      • Compliant with IVOA standards, US priorities, IVOA-coordinated
    • User support
      • Documentation, portal, …
    • Operations
      • Provides the necessary framework
    • Technology assessment
      • What technologies do we use/introduce?
    • EPO
    • Data Curation and Preservation
  • 9. VAO DC&P Components
    • Mission/Observatory data archives
      • NASA centers: more or less OAIS- and TDR-compliant
      • For CDA, we keep it all (all versions, multiple copies) on spinning disk
    • Contributed datasets
      • This is a problem area
    • Bibliographic repository
      • The Astrophysics Data System (ADS) has the entire astronomical literature (excepting books) online for the entire international community
    • Semantic linking
      • Linking datasets with datasets, papers with papers, datasets with papers; another problem area; Dataset Identifiers
      • Discovery tools
  • 10. Ontology/Semantic Linking
    • Triplestore prototype discovery tool:
    • Just a factoid from collecting bibliographic links:
      • The amount of Chandra data published each year is the equivalent of 5-6 years of observations
  • 11. Challenges (and some solutions)
    • Data Management Plans
      • We have experience, we will provide guidance and support
    • Establish public repositories
      • If you want people to contribute datasets, you have to give them a place to put them; who is going to provide/run this?
    • Funding
      • DMPs should help here; beware of unfunded mandates
    • Make distributed repositories interoperate and transparently look like one
      • This is easier for the established mission/observatory archives than for repositories of contributed datasets
    • Define metadata requirements
      • Probably the most crucial challenge
  • 12. More Challenges
    • Get users to contribute their datasets
      • Highly processed products; data behind the plots and figures
      • One has to make this easy: special tools needed
    • Get the users to provide adequate and correct metadata
      • Even harder; but also even more important *
    • Get users to provide links
      • Equally hard; but essential for semantic data discovery
    • These three need to be part of the manuscript submission process
        • * AVM standard for astronomical Jpegs is a step in the right direction
  • 13. And More Challenges
    • Get the community to release research data
      • Not a problem for NASA-funded data
      • The NSF DMP requirement is a good step forward
      • The culture is changing, but private institutions are problematic
    • OAIS compliance will probably come around naturally
    • Agency IT standards and requirements
    • Certification as a Trusted Digital Repository
      • It is harder to convince people that this is a good thing
      • But it may come to be required at some point
  • 14. Repository Requirements
    • A Trusted Digital Repository should provide these services:
      • Storage
      • Standardized metadata
      • Access
      • Authorization
      • Provenance
      • Curation
      • Authentication
      • Policy enforcement
  • 15. Repository Requirements
    • Preservation metadata need to cover:
      • Authenticity
      • Original arrangement
      • Integrity
      • Chain of custody and history
      • Trustworthiness
    • Data processing, archiving, and all else will cease at some point, but there are three areas that will never be finished:
      • Preservation
      • User support
      • User interface development
    • Preservation of software?