• Save
Comeaux RDAP11 Data Archives in Federal Agencies
Upcoming SlideShare
Loading in...5
×
 

Comeaux RDAP11 Data Archives in Federal Agencies

on

  • 694 views

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies ...

Joey Comeaux, CICL RDA; Data Archives in Federal Agencies

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html

Statistics

Views

Total Views
694
Views on SlideShare
694
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Cliff is here – I forgot to mention to him we put him on this talk. His talk follows Tom Karl – first thing Tuesday morning. He is now retired from NSF, but still works 4 days a week.
  • Doug Schuster, IIPs Session 6B Tuesday PM
  • Excellent – note something about each curve (Users and Size)
  • Excellent 2-punch slide.
  • How about “Six Requirement for Data Curation”
  • I made these changes on the slide. Bullet one – Data Management => data management Bullets three and four are a part of “Allows flexibility”. Indent and phrase as * Data management can evolve to meet science support expectation * Holistic approach, not project specific
  • I made these changes. Bullet 5 under 1, Researchers => researchers Bullet 4 under 2, We find 5-10 years experience yields an expert Bullet 5 under 2, RDA support group, cumulative 128 years experience
  • Read carefully – I made edits in the slide What does “Careful deliberation needed” mean?
  • Read carefully, I edited the slide directly
  • Read carefully, I changed this slide.
  • I changed Primary to User.
  • I made changes to the slide – read carefully – especially Metadata
  • Check text – I made changes.
  • Changed to match earlier slide

Comeaux RDAP11 Data Archives in Federal Agencies Comeaux RDAP11 Data Archives in Federal Agencies Presentation Transcript

  • 04/12/11 Successful Data Curation for Large Data Archives Joseph L “Joey” Comeaux Steven J. Worley NCAR and Cliff Jacobs NSF RDAP-2011
  • 04/12/11
    • NCAR’s Research Data Archive
    • Managed by 8 members of the Data Support Section
    • Within Computational and Information Systems Laboratory at NCAR
    • Started in ~1965
    • Primarily Meteorological and Oceanographic datasets
    • >200 person-years invested in RDA
    AMS-2011
  • 04/12/11 Contents of the RDA > 600 datasets
  • 04/12/11 AMS-2011
  • RDAP-2011 04/12/11 CISL Data Life-Cycle Model
  • RDAP-2011 04/12/11 CISL Data Life-Cycle Model
  • AMS-2011 04/12/11
    • Six Requirements for Data Curation
    • Stable Funding
    • Knowledgeable / Consistent Staffing
    • Robust Storage
    • Backups
    • Partnerships
    • Formats
  • 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding
    • Stable Funding
    • Focused on data management
    • Allows flexibility
      • Data management can evolve to meet science support expectations
      • Holistic approach, not project specific
    • Necessary to keep curated collection viable
  • 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff
  • 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff
    • Enriched Staff
    • Knowledgeable and educated in the specific discipline
      • Important for checking integrity of data
      • Choosing organization of data
      • Creating adequate meta-data
      • Designing access system and assisting users
      • Staff is link to researches
    • Consistent Staffing Levels
      • Dedicated to best practices in archiving and stewardship
      • Great deal of knowledge held by staff, regardless of documentation
      • Value of human based knowledge cannot be under-estimated
      • We find 5-10 years experience yields an expert
      • RDA support group, cumulative 128 years experience
  • 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage
    • Robust Storage Facilities
    • Capability to meet growth and technology changes
      • NCAR
        • Tape based archive
          • Size > 2x every 2.5 years : >10PB now
          • Technology evolution, tapes : 20GB -> 60GB -> 200GB -> 1000GB
          • Full RDA about 600TBs (600 tapes)
          • Changes ‘transparent’ for RDA staff and users
        • Enhance user access with disk
          • Fast access
          • Available for RDA :
          • 1.5TB -> 12TB -> 70TB -> 300TB
          • Careful deliberation needed
  • 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
    • Backups
    • Loss of data attributed to 2 general causes
    • Physical equipment or facility failures
      • Environmental -> Fire, Flood, Earthquake….
      • Equipment -> Drive failures, Tape deterioration
    • Resolution
      • Store copies of irreplaceable data at separate physical locations
  • 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
    • Backups
    • Loss of data attributed to 2 general causes
    • Poor Curation/Stewardship, e.g. breaks in best practice
      • Loss of metadata
      • Accidental data over-writes and deletions
      • Lack of background knowledge
        • Misjudge importance of particular data
      • Resolution
      • Treat metadata and data equally
      • Implement safeguards to minimize risk of over-writes and accidental deletions
      • Knowledgeable staff
  • 04/12/11 RDAP-2011 Growth of RDA
  • 04/12/11 AMS-2011 Growth of RDA
  • 04/12/11 RDAP-2011 Growth of RDA
  • 04/12/11 RDAP-2011 Growth of RDA BACKUPS RDA : 40%
  • 04/12/11 Sustainable Data Curation RDAP-2011 Stable Funding Enriched Staff Robust Storage Backups
  • RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
    • Data Format
    • Ensure data access for long-term
    • Fully documented to the byte level
    • Use standard formats
    • Non-proprietary
      • Not dependent on OS, hardware or applications
    • Choices can affect storage requirements
    • Metadata Format
    • Utilize a standard
    • Multiple levels are becoming more important
      • Discovery
      • File attribute
      • File content
    • Backup of metadata is important
  • RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats
  • RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Robust Storage Backups Formats Partner ships
    • Partnerships
    • World-wide sharing and unrestricted open access provides greater research opportunities than any one center can provide
    • National and international
    • No single institute can “do it all”
    • Most users “need/want it all”
    • Good way to share some costs
    • Reanalyses project are examples
  • RDAP-2011 04/12/11 Sustainable Data Curation Stable Funding Enriched Staff Archive Pertinent Data Robust Storage Backups Formats Partner ships
  • 1960’s – 1990’s 01/25/2011 RDAP 2011 Data Model
  • 1990’s – 2010’s 01/25/2011 RDAP 2011 Data Model
  • 04/12/11 RDAP-2011
    • SUMMARY
    • Six Requirements for Data Curation
      • Stable Funding
      • Enriched Staff
      • Robust Storage
      • Backups
      • Formats
      • Partnerships