Your SlideShare is downloading. ×
0
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Comeaux RDAP11 Data Archives in Federal Agencies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Comeaux RDAP11 Data Archives in Federal Agencies

572

Published on

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies …

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
572
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Cliff is here – I forgot to mention to him we put him on this talk. His talk follows Tom Karl – first thing Tuesday morning. He is now retired from NSF, but still works 4 days a week.
  • Doug Schuster, IIPs Session 6B Tuesday PM
  • Excellent – note something about each curve (Users and Size)
  • Excellent 2-punch slide.
  • How about “Six Requirement for Data Curation”
  • I made these changes on the slide. Bullet one – Data Management => data management Bullets three and four are a part of “Allows flexibility”. Indent and phrase as * Data management can evolve to meet science support expectation * Holistic approach, not project specific
  • I made these changes. Bullet 5 under 1, Researchers => researchers Bullet 4 under 2, We find 5-10 years experience yields an expert Bullet 5 under 2, RDA support group, cumulative 128 years experience
  • Read carefully – I made edits in the slide What does “Careful deliberation needed” mean?
  • Read carefully, I edited the slide directly
  • Read carefully, I changed this slide.
  • I changed Primary to User.
  • I made changes to the slide – read carefully – especially Metadata
  • Check text – I made changes.
  • Changed to match earlier slide
  • Transcript

    • 1. 04/12/11 Successful Data Curation for Large Data Archives Joseph L “Joey” Comeaux Steven J. Worley NCAR and Cliff Jacobs NSF RDAP-2011
    • 2. 04/12/11 <ul><li>NCAR’s Research Data Archive </li></ul><ul><li>Managed by 8 members of the Data Support Section </li></ul><ul><li>Within Computational and Information Systems Laboratory at NCAR </li></ul><ul><li>Started in ~1965 </li></ul><ul><li>Primarily Meteorological and Oceanographic datasets </li></ul><ul><li>>200 person-years invested in RDA </li></ul>AMS-2011
    • 3. 04/12/11 Contents of the RDA > 600 datasets
    • 4. 04/12/11 AMS-2011
    • 5. RDAP-2011 04/12/11 CISL Data Life-Cycle Model
    • 6. RDAP-2011 04/12/11 CISL Data Life-Cycle Model
    • 7. AMS-2011 04/12/11 <ul><li>Six Requirements for Data Curation </li></ul><ul><li>Stable Funding </li></ul><ul><li>Knowledgeable / Consistent Staffing </li></ul><ul><li>Robust Storage </li></ul><ul><li>Backups </li></ul><ul><li>Partnerships </li></ul><ul><li>Formats </li></ul>
    • 8. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding <ul><li>Stable Funding </li></ul><ul><li>Focused on data management </li></ul><ul><li>Allows flexibility </li></ul><ul><ul><li>Data management can evolve to meet science support expectations </li></ul></ul><ul><ul><li>Holistic approach, not project specific </li></ul></ul><ul><li>Necessary to keep curated collection viable </li></ul>
    • 9. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff
    • 10. 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff <ul><li>Enriched Staff </li></ul><ul><li>Knowledgeable and educated in the specific discipline </li></ul><ul><ul><li>Important for checking integrity of data </li></ul></ul><ul><ul><li>Choosing organization of data </li></ul></ul><ul><ul><li>Creating adequate meta-data </li></ul></ul><ul><ul><li>Designing access system and assisting users </li></ul></ul><ul><ul><li>Staff is link to researches </li></ul></ul><ul><li>Consistent Staffing Levels </li></ul><ul><ul><li>Dedicated to best practices in archiving and stewardship </li></ul></ul><ul><ul><li>Great deal of knowledge held by staff, regardless of documentation </li></ul></ul><ul><ul><li>Value of human based knowledge cannot be under-estimated </li></ul></ul><ul><ul><li>We find 5-10 years experience yields an expert </li></ul></ul><ul><ul><li>RDA support group, cumulative 128 years experience </li></ul></ul>
    • 11. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage <ul><li>Robust Storage Facilities </li></ul><ul><li>Capability to meet growth and technology changes </li></ul><ul><ul><li>NCAR </li></ul></ul><ul><ul><ul><li>Tape based archive </li></ul></ul></ul><ul><ul><ul><ul><li>Size > 2x every 2.5 years : >10PB now </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Technology evolution, tapes : 20GB -> 60GB -> 200GB -> 1000GB </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Full RDA about 600TBs (600 tapes) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Changes ‘transparent’ for RDA staff and users </li></ul></ul></ul></ul><ul><ul><ul><li>Enhance user access with disk </li></ul></ul></ul><ul><ul><ul><ul><li>Fast access </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Available for RDA : </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1.5TB -> 12TB -> 70TB -> 300TB </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Careful deliberation needed </li></ul></ul></ul></ul>
    • 12. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups <ul><li>Backups </li></ul><ul><li>Loss of data attributed to 2 general causes </li></ul><ul><li>Physical equipment or facility failures </li></ul><ul><ul><li>Environmental -> Fire, Flood, Earthquake…. </li></ul></ul><ul><ul><li>Equipment -> Drive failures, Tape deterioration </li></ul></ul><ul><li>Resolution </li></ul><ul><ul><li>Store copies of irreplaceable data at separate physical locations </li></ul></ul>
    • 13. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups <ul><li>Backups </li></ul><ul><li>Loss of data attributed to 2 general causes </li></ul><ul><li>Poor Curation/Stewardship, e.g. breaks in best practice </li></ul><ul><ul><li>Loss of metadata </li></ul></ul><ul><ul><li>Accidental data over-writes and deletions </li></ul></ul><ul><ul><li>Lack of background knowledge </li></ul></ul><ul><ul><ul><li>Misjudge importance of particular data </li></ul></ul></ul><ul><ul><li>Resolution </li></ul></ul><ul><ul><li>Treat metadata and data equally </li></ul></ul><ul><ul><li>Implement safeguards to minimize risk of over-writes and accidental deletions </li></ul></ul><ul><ul><li>Knowledgeable staff </li></ul></ul>
    • 14. 04/12/11 RDAP-2011 Growth of RDA
    • 15. 04/12/11 AMS-2011 Growth of RDA
    • 16. 04/12/11 RDAP-2011 Growth of RDA
    • 17. 04/12/11 RDAP-2011 Growth of RDA BACKUPS RDA : 40%
    • 18. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
    • 19. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats <ul><li>Data Format </li></ul><ul><li>Ensure data access for long-term </li></ul><ul><li>Fully documented to the byte level </li></ul><ul><li>Use standard formats </li></ul><ul><li>Non-proprietary </li></ul><ul><ul><li>Not dependent on OS, hardware or applications </li></ul></ul><ul><li>Choices can affect storage requirements </li></ul><ul><li>Metadata Format </li></ul><ul><li>Utilize a standard </li></ul><ul><li>Multiple levels are becoming more important </li></ul><ul><ul><li>Discovery </li></ul></ul><ul><ul><li>File attribute </li></ul></ul><ul><ul><li>File content </li></ul></ul><ul><li>Backup of metadata is important </li></ul>
    • 20. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
    • 21. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats Partner ships <ul><li>Partnerships </li></ul><ul><li>World-wide sharing and unrestricted open access provides greater research opportunities than any one center can provide </li></ul><ul><li>National and international </li></ul><ul><li>No single institute can “do it all” </li></ul><ul><li>Most users “need/want it all” </li></ul><ul><li>Good way to share some costs </li></ul><ul><li>Reanalyses project are examples </li></ul>
    • 22. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Archive Pertinent Data Robust Storage Backups Formats Partner ships
    • 23. 1960’s – 1990’s 01/25/2011 RDAP 2011 Data Model
    • 24. 1990’s – 2010’s 01/25/2011 RDAP 2011 Data Model
    • 25. 04/12/11 RDAP-2011 <ul><li>SUMMARY </li></ul><ul><li>Six Requirements for Data Curation </li></ul><ul><ul><li>Stable Funding </li></ul></ul><ul><ul><li>Enriched Staff </li></ul></ul><ul><ul><li>Robust Storage </li></ul></ul><ul><ul><li>Backups </li></ul></ul><ul><ul><li>Formats </li></ul></ul><ul><ul><li>Partnerships </li></ul></ul>

    ×