04/12/11 Successful Data Curation for Large Data Archives Joseph L “Joey” Comeaux  Steven J. Worley NCAR and   Cliff Jacob...
04/12/11 <ul><li>NCAR’s Research Data Archive </li></ul><ul><li>Managed by 8 members of the Data  Support Section  </li></...
04/12/11 Contents of the RDA > 600 datasets
04/12/11 AMS-2011
RDAP-2011 04/12/11 CISL Data Life-Cycle Model
RDAP-2011 04/12/11 CISL Data Life-Cycle Model
AMS-2011 04/12/11 <ul><li>Six Requirements for Data Curation </li></ul><ul><li>Stable Funding </li></ul><ul><li>Knowledgea...
04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding <ul><li>Stable Funding </li></ul><ul><li>Focused on data manag...
04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff
04/12/11 Sustainable Data Curation Stable Funding Enriched Staff <ul><li>Enriched   Staff </li></ul><ul><li>Knowledgeable ...
04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage <ul><li>Robust Storage Facilitie...
04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups <ul><li>Backups </li></u...
04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups <ul><li>Backups </li></u...
04/12/11 RDAP-2011 Growth of RDA
04/12/11 AMS-2011 Growth of RDA
04/12/11 RDAP-2011 Growth of RDA
04/12/11 RDAP-2011 Growth of RDA  BACKUPS RDA : 40%
04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats <ul><li>Data For...
RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats Partner ships <u...
RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Archive Pertinent Data Robust Storage Backups F...
1960’s – 1990’s 01/25/2011 RDAP 2011 Data Model
1990’s – 2010’s 01/25/2011 RDAP 2011 Data Model
04/12/11 RDAP-2011 <ul><li>SUMMARY </li></ul><ul><li>Six Requirements for Data Curation </li></ul><ul><ul><li>Stable Fundi...
Upcoming SlideShare
Loading in...5
×

Comeaux RDAP11 Data Archives in Federal Agencies

581

Published on

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
581
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Cliff is here – I forgot to mention to him we put him on this talk. His talk follows Tom Karl – first thing Tuesday morning. He is now retired from NSF, but still works 4 days a week.
  • Doug Schuster, IIPs Session 6B Tuesday PM
  • Excellent – note something about each curve (Users and Size)
  • Excellent 2-punch slide.
  • How about “Six Requirement for Data Curation”
  • I made these changes on the slide. Bullet one – Data Management =&gt; data management Bullets three and four are a part of “Allows flexibility”. Indent and phrase as * Data management can evolve to meet science support expectation * Holistic approach, not project specific
  • I made these changes. Bullet 5 under 1, Researchers =&gt; researchers Bullet 4 under 2, We find 5-10 years experience yields an expert Bullet 5 under 2, RDA support group, cumulative 128 years experience
  • Read carefully – I made edits in the slide What does “Careful deliberation needed” mean?
  • Read carefully, I edited the slide directly
  • Read carefully, I changed this slide.
  • I changed Primary to User.
  • I made changes to the slide – read carefully – especially Metadata
  • Check text – I made changes.
  • Changed to match earlier slide
  • Comeaux RDAP11 Data Archives in Federal Agencies

    1. 1. 04/12/11 Successful Data Curation for Large Data Archives Joseph L “Joey” Comeaux Steven J. Worley NCAR and Cliff Jacobs NSF RDAP-2011
    2. 2. 04/12/11 <ul><li>NCAR’s Research Data Archive </li></ul><ul><li>Managed by 8 members of the Data Support Section </li></ul><ul><li>Within Computational and Information Systems Laboratory at NCAR </li></ul><ul><li>Started in ~1965 </li></ul><ul><li>Primarily Meteorological and Oceanographic datasets </li></ul><ul><li>>200 person-years invested in RDA </li></ul>AMS-2011
    3. 3. 04/12/11 Contents of the RDA > 600 datasets
    4. 4. 04/12/11 AMS-2011
    5. 5. RDAP-2011 04/12/11 CISL Data Life-Cycle Model
    6. 6. RDAP-2011 04/12/11 CISL Data Life-Cycle Model
    7. 7. AMS-2011 04/12/11 <ul><li>Six Requirements for Data Curation </li></ul><ul><li>Stable Funding </li></ul><ul><li>Knowledgeable / Consistent Staffing </li></ul><ul><li>Robust Storage </li></ul><ul><li>Backups </li></ul><ul><li>Partnerships </li></ul><ul><li>Formats </li></ul>
    8. 8. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding <ul><li>Stable Funding </li></ul><ul><li>Focused on data management </li></ul><ul><li>Allows flexibility </li></ul><ul><ul><li>Data management can evolve to meet science support expectations </li></ul></ul><ul><ul><li>Holistic approach, not project specific </li></ul></ul><ul><li>Necessary to keep curated collection viable </li></ul>
    9. 9. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff
    10. 10. 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff <ul><li>Enriched Staff </li></ul><ul><li>Knowledgeable and educated in the specific discipline </li></ul><ul><ul><li>Important for checking integrity of data </li></ul></ul><ul><ul><li>Choosing organization of data </li></ul></ul><ul><ul><li>Creating adequate meta-data </li></ul></ul><ul><ul><li>Designing access system and assisting users </li></ul></ul><ul><ul><li>Staff is link to researches </li></ul></ul><ul><li>Consistent Staffing Levels </li></ul><ul><ul><li>Dedicated to best practices in archiving and stewardship </li></ul></ul><ul><ul><li>Great deal of knowledge held by staff, regardless of documentation </li></ul></ul><ul><ul><li>Value of human based knowledge cannot be under-estimated </li></ul></ul><ul><ul><li>We find 5-10 years experience yields an expert </li></ul></ul><ul><ul><li>RDA support group, cumulative 128 years experience </li></ul></ul>
    11. 11. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage <ul><li>Robust Storage Facilities </li></ul><ul><li>Capability to meet growth and technology changes </li></ul><ul><ul><li>NCAR </li></ul></ul><ul><ul><ul><li>Tape based archive </li></ul></ul></ul><ul><ul><ul><ul><li>Size > 2x every 2.5 years : >10PB now </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Technology evolution, tapes : 20GB -> 60GB -> 200GB -> 1000GB </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Full RDA about 600TBs (600 tapes) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Changes ‘transparent’ for RDA staff and users </li></ul></ul></ul></ul><ul><ul><ul><li>Enhance user access with disk </li></ul></ul></ul><ul><ul><ul><ul><li>Fast access </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Available for RDA : </li></ul></ul></ul></ul><ul><ul><ul><ul><li>1.5TB -> 12TB -> 70TB -> 300TB </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Careful deliberation needed </li></ul></ul></ul></ul>
    12. 12. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups <ul><li>Backups </li></ul><ul><li>Loss of data attributed to 2 general causes </li></ul><ul><li>Physical equipment or facility failures </li></ul><ul><ul><li>Environmental -> Fire, Flood, Earthquake…. </li></ul></ul><ul><ul><li>Equipment -> Drive failures, Tape deterioration </li></ul></ul><ul><li>Resolution </li></ul><ul><ul><li>Store copies of irreplaceable data at separate physical locations </li></ul></ul>
    13. 13. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups <ul><li>Backups </li></ul><ul><li>Loss of data attributed to 2 general causes </li></ul><ul><li>Poor Curation/Stewardship, e.g. breaks in best practice </li></ul><ul><ul><li>Loss of metadata </li></ul></ul><ul><ul><li>Accidental data over-writes and deletions </li></ul></ul><ul><ul><li>Lack of background knowledge </li></ul></ul><ul><ul><ul><li>Misjudge importance of particular data </li></ul></ul></ul><ul><ul><li>Resolution </li></ul></ul><ul><ul><li>Treat metadata and data equally </li></ul></ul><ul><ul><li>Implement safeguards to minimize risk of over-writes and accidental deletions </li></ul></ul><ul><ul><li>Knowledgeable staff </li></ul></ul>
    14. 14. 04/12/11 RDAP-2011 Growth of RDA
    15. 15. 04/12/11 AMS-2011 Growth of RDA
    16. 16. 04/12/11 RDAP-2011 Growth of RDA
    17. 17. 04/12/11 RDAP-2011 Growth of RDA BACKUPS RDA : 40%
    18. 18. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
    19. 19. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats <ul><li>Data Format </li></ul><ul><li>Ensure data access for long-term </li></ul><ul><li>Fully documented to the byte level </li></ul><ul><li>Use standard formats </li></ul><ul><li>Non-proprietary </li></ul><ul><ul><li>Not dependent on OS, hardware or applications </li></ul></ul><ul><li>Choices can affect storage requirements </li></ul><ul><li>Metadata Format </li></ul><ul><li>Utilize a standard </li></ul><ul><li>Multiple levels are becoming more important </li></ul><ul><ul><li>Discovery </li></ul></ul><ul><ul><li>File attribute </li></ul></ul><ul><ul><li>File content </li></ul></ul><ul><li>Backup of metadata is important </li></ul>
    20. 20. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
    21. 21. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats Partner ships <ul><li>Partnerships </li></ul><ul><li>World-wide sharing and unrestricted open access provides greater research opportunities than any one center can provide </li></ul><ul><li>National and international </li></ul><ul><li>No single institute can “do it all” </li></ul><ul><li>Most users “need/want it all” </li></ul><ul><li>Good way to share some costs </li></ul><ul><li>Reanalyses project are examples </li></ul>
    22. 22. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Archive Pertinent Data Robust Storage Backups Formats Partner ships
    23. 23. 1960’s – 1990’s 01/25/2011 RDAP 2011 Data Model
    24. 24. 1990’s – 2010’s 01/25/2011 RDAP 2011 Data Model
    25. 25. 04/12/11 RDAP-2011 <ul><li>SUMMARY </li></ul><ul><li>Six Requirements for Data Curation </li></ul><ul><ul><li>Stable Funding </li></ul></ul><ul><ul><li>Enriched Staff </li></ul></ul><ul><ul><li>Robust Storage </li></ul></ul><ul><ul><li>Backups </li></ul></ul><ul><ul><li>Formats </li></ul></ul><ul><ul><li>Partnerships </li></ul></ul>

    ×