Your SlideShare is downloading. ×
Rots RDAP11 Data Archives in Federal Agencies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Rots RDAP11 Data Archives in Federal Agencies

630
views

Published on

Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit …

Arnold Rots, VAO; Data Archives in Federal Agencies; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
630
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Preservation of Astronomical Data Arnold Rots Smithsonian Astrophysical Observatory Virtual Astronomical Observatory
  • 2. Context
    • I am Archive Astrophysicist for the Chandra Data Archive
      • Chandra is an X-ray telescope, one of NASA’s great observatories
    • The CDA is operated by the Smithsonian Astrophysical Observatory under contract with NASA
      • Here are right away two separate federal masters
      • As such it is one of the NASA astrophysics data centers
    • But I am also the lead for Data Curation and Preservation for the Virtual Astronomical Observatory
      • The CDA is compliant with VAO standards
    • The VAO is a member of the International Virtual Observatory Alliance
  • 3. The Astronomical Data Universe
    • International Virtual Observatory Alliance
    • Virtual Astronomical Observatory (USA)
    • Chandra Data Archive
  • 4. The Other Data Universe
    • Smithsonian Institution
    • Smithsonian Astrophysical Observatory
    • Chandra Data Archive
  • 5. Our Complex Relations
    • As an observatory data archive we have multiple federal masters:
      • Smithsonian Institution
      • NASA
      • NSF (through VAO)
    • And more non-federal masters:
      • IVOA
        • Interoperability standards and protocols
      • US user community
      • International user community
    • Bright points:
      • No privacy issues
      • No national security issues
      • No commercial value
  • 6. The Smithsonian Side
    • SI is going digital
    • There is a world of difference between collections and scientific research data
    • On the one hand our experience is valuable for the non-research units
    • On the other hand, there are legitimate differences in approach and requirements
  • 7. Virtual Observatory
    • Objective:
      • Make all astronomical data seamlessly accessible
      • Provide analysis tools
    • This requires:
      • Interoperability standards: metadata and protocols
          • Ubiquity of FITS data file format helps
      • Development of general tools
    • IVOA
      • Standards authority
      • Collaborative consortium of national VO organizations
    • VAO
      • USA member of the IVOA
      • NSF & NASA-funded collaboration of nine institutions
  • 8. VAO: Virtual Astronomical Observatory
    • Standards and protocols
      • Collaborating in the IVOA framework
    • Tools development
      • Compliant with IVOA standards, US priorities, IVOA-coordinated
    • User support
      • Documentation, portal, …
    • Operations
      • Provides the necessary framework
    • Technology assessment
      • What technologies do we use/introduce?
    • EPO
    • Data Curation and Preservation
  • 9. VAO DC&P Components
    • Mission/Observatory data archives
      • NASA centers: more or less OAIS- and TDR-compliant
      • For CDA, we keep it all (all versions, multiple copies) on spinning disk
    • Contributed datasets
      • This is a problem area
    • Bibliographic repository
      • The Astrophysics Data System (ADS) has the entire astronomical literature (excepting books) online for the entire international community
    • Semantic linking
      • Linking datasets with datasets, papers with papers, datasets with papers; another problem area; Dataset Identifiers
      • Discovery tools
  • 10. Ontology/Semantic Linking
    • Triplestore prototype discovery tool:
    • http://adslabs.harvard.edu/semantic/publications.html
    • Just a factoid from collecting bibliographic links:
      • The amount of Chandra data published each year is the equivalent of 5-6 years of observations
  • 11. Challenges (and some solutions)
    • Data Management Plans
      • We have experience, we will provide guidance and support
    • Establish public repositories
      • If you want people to contribute datasets, you have to give them a place to put them; who is going to provide/run this?
    • Funding
      • DMPs should help here; beware of unfunded mandates
    • Make distributed repositories interoperate and transparently look like one
      • This is easier for the established mission/observatory archives than for repositories of contributed datasets
    • Define metadata requirements
      • Probably the most crucial challenge
  • 12. More Challenges
    • Get users to contribute their datasets
      • Highly processed products; data behind the plots and figures
      • One has to make this easy: special tools needed
    • Get the users to provide adequate and correct metadata
      • Even harder; but also even more important *
    • Get users to provide links
      • Equally hard; but essential for semantic data discovery
    • These three need to be part of the manuscript submission process
        • * AVM standard for astronomical Jpegs is a step in the right direction
  • 13. And More Challenges
    • Get the community to release research data
      • Not a problem for NASA-funded data
      • The NSF DMP requirement is a good step forward
      • The culture is changing, but private institutions are problematic
    • OAIS compliance will probably come around naturally
    • Agency IT standards and requirements
    • Certification as a Trusted Digital Repository
      • It is harder to convince people that this is a good thing
      • But it may come to be required at some point
  • 14. Repository Requirements
    • A Trusted Digital Repository should provide these services:
      • Storage
      • Standardized metadata
      • Access
      • Authorization
      • Provenance
      • Curation
      • Authentication
      • Policy enforcement
  • 15. Repository Requirements
    • Preservation metadata need to cover:
      • Authenticity
      • Original arrangement
      • Integrity
      • Chain of custody and history
      • Trustworthiness
    • Data processing, archiving, and all else will cease at some point, but there are three areas that will never be finished:
      • Preservation
      • User support
      • User interface development
    • Preservation of software?