Your SlideShare is downloading. ×
Comeaux RDAP11 Data Archives in Federal Agencies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Comeaux RDAP11 Data Archives in Federal Agencies

562

Published on

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies …

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
562
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Cliff is here – I forgot to mention to him we put him on this talk. His talk follows Tom Karl – first thing Tuesday morning. He is now retired from NSF, but still works 4 days a week.
  • Doug Schuster, IIPs Session 6B Tuesday PM
  • Excellent – note something about each curve (Users and Size)
  • Excellent 2-punch slide.
  • How about “Six Requirement for Data Curation”
  • I made these changes on the slide. Bullet one – Data Management => data management Bullets three and four are a part of “Allows flexibility”. Indent and phrase as * Data management can evolve to meet science support expectation * Holistic approach, not project specific
  • I made these changes. Bullet 5 under 1, Researchers => researchers Bullet 4 under 2, We find 5-10 years experience yields an expert Bullet 5 under 2, RDA support group, cumulative 128 years experience
  • Read carefully – I made edits in the slide What does “Careful deliberation needed” mean?
  • Read carefully, I edited the slide directly
  • Read carefully, I changed this slide.
  • I changed Primary to User.
  • I made changes to the slide – read carefully – especially Metadata
  • Check text – I made changes.
  • Changed to match earlier slide
  • Transcript

    • 1. 04/12/11 Successful Data Curation for Large Data Archives Joseph L “Joey” Comeaux Steven J. Worley NCAR and Cliff Jacobs NSF RDAP-2011
    • 2. 04/12/11
      • NCAR’s Research Data Archive
      • Managed by 8 members of the Data Support Section
      • Within Computational and Information Systems Laboratory at NCAR
      • Started in ~1965
      • Primarily Meteorological and Oceanographic datasets
      • >200 person-years invested in RDA
      AMS-2011
    • 3. 04/12/11 Contents of the RDA > 600 datasets
    • 4. 04/12/11 AMS-2011
    • 5. RDAP-2011 04/12/11 CISL Data Life-Cycle Model
    • 6. RDAP-2011 04/12/11 CISL Data Life-Cycle Model
    • 7. AMS-2011 04/12/11
      • Six Requirements for Data Curation
      • Stable Funding
      • Knowledgeable / Consistent Staffing
      • Robust Storage
      • Backups
      • Partnerships
      • Formats
    • 8. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding
      • Stable Funding
      • Focused on data management
      • Allows flexibility
        • Data management can evolve to meet science support expectations
        • Holistic approach, not project specific
      • Necessary to keep curated collection viable
    • 9. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff
    • 10. 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff
      • Enriched Staff
      • Knowledgeable and educated in the specific discipline
        • Important for checking integrity of data
        • Choosing organization of data
        • Creating adequate meta-data
        • Designing access system and assisting users
        • Staff is link to researches
      • Consistent Staffing Levels
        • Dedicated to best practices in archiving and stewardship
        • Great deal of knowledge held by staff, regardless of documentation
        • Value of human based knowledge cannot be under-estimated
        • We find 5-10 years experience yields an expert
        • RDA support group, cumulative 128 years experience
    • 11. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage
      • Robust Storage Facilities
      • Capability to meet growth and technology changes
        • NCAR
          • Tape based archive
            • Size > 2x every 2.5 years : >10PB now
            • Technology evolution, tapes : 20GB -> 60GB -> 200GB -> 1000GB
            • Full RDA about 600TBs (600 tapes)
            • Changes ‘transparent’ for RDA staff and users
          • Enhance user access with disk
            • Fast access
            • Available for RDA :
            • 1.5TB -> 12TB -> 70TB -> 300TB
            • Careful deliberation needed
    • 12. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
      • Backups
      • Loss of data attributed to 2 general causes
      • Physical equipment or facility failures
        • Environmental -> Fire, Flood, Earthquake….
        • Equipment -> Drive failures, Tape deterioration
      • Resolution
        • Store copies of irreplaceable data at separate physical locations
    • 13. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
      • Backups
      • Loss of data attributed to 2 general causes
      • Poor Curation/Stewardship, e.g. breaks in best practice
        • Loss of metadata
        • Accidental data over-writes and deletions
        • Lack of background knowledge
          • Misjudge importance of particular data
        • Resolution
        • Treat metadata and data equally
        • Implement safeguards to minimize risk of over-writes and accidental deletions
        • Knowledgeable staff
    • 14. 04/12/11 RDAP-2011 Growth of RDA
    • 15. 04/12/11 AMS-2011 Growth of RDA
    • 16. 04/12/11 RDAP-2011 Growth of RDA
    • 17. 04/12/11 RDAP-2011 Growth of RDA BACKUPS RDA : 40%
    • 18. 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
    • 19. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
      • Data Format
      • Ensure data access for long-term
      • Fully documented to the byte level
      • Use standard formats
      • Non-proprietary
        • Not dependent on OS, hardware or applications
      • Choices can affect storage requirements
      • Metadata Format
      • Utilize a standard
      • Multiple levels are becoming more important
        • Discovery
        • File attribute
        • File content
      • Backup of metadata is important
    • 20. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
    • 21. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats Partner ships
      • Partnerships
      • World-wide sharing and unrestricted open access provides greater research opportunities than any one center can provide
      • National and international
      • No single institute can “do it all”
      • Most users “need/want it all”
      • Good way to share some costs
      • Reanalyses project are examples
    • 22. RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Archive Pertinent Data Robust Storage Backups Formats Partner ships
    • 23. 1960’s – 1990’s 01/25/2011 RDAP 2011 Data Model
    • 24. 1990’s – 2010’s 01/25/2011 RDAP 2011 Data Model
    • 25. 04/12/11 RDAP-2011
      • SUMMARY
      • Six Requirements for Data Curation
        • Stable Funding
        • Enriched Staff
        • Robust Storage
        • Backups
        • Formats
        • Partnerships

    ×