Your SlideShare is downloading. ×
Beecher cni fall 2010 v4
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Beecher cni fall 2010 v4

534

Published on

This is a talk from the Coalition for Networked Information Fall 2010 Member Meeting (CNIfall2010). I talked about our project to use Fedora as archival storage for social science research data and …

This is a talk from the Coalition for Networked Information Fall 2010 Member Meeting (CNIfall2010). I talked about our project to use Fedora as archival storage for social science research data and documentation.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
534
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Preserving Social Science Research Data Using Fedora Bryan Beecher Inter-university Consortium for Political and Social Research (ICPSR) CNI Fall 2010 Membership Meeting
  • 2. ICPSR
    • World’s largest social science research data archive
      • Lots of files (millions)
      • Small files (6TB total)
    • Long track record of success – 50 yrs
      • Trust us
      • Enormous legacy burden
  • 3. ICPSR
    • Survey data are our core
      • Low volume of new content compared to natural sciences
      • We curate each item extensively (disclosure, quality, format, usability)
    • Strong access orientation
      • Talk like an archive
      • Walk like an archive?
  • 4. Walking the walk
    • Good storage container for content and its metadata
    • OAIS-compliant
    • Generate SIPs and AIPs (and DIPs)
    • But…
  • 5. What should we do?
  • 6. Where to begin?
    • Focus areas
    • Preservation
    • Going forward
    • Reusable
    • Do not try to include
    • Access
    • Everything we have
  • 7. A Solution
    • Fedora objects
      • Container for stuff we ingest and preserve
    • Fedora services
      • To generate AIPs and SIPs
    • Tool to generate FOs from existing content and metadata
  • 8. Ingest
    • The Motivated Depositor
      • Eager to describe the research data in great detail
      • Uploads complete, machine-readable metadata
  • 9. Ingest (continued)
    • The Unmotivated Depositor
      • Upload a variety of proprietary file formats for documentation and data
      • Leaves the baby on the doorstep
  • 10. Ingest – Nov 2010 deposits
  • 11. Ingest (continued)
    • Typical deposit
      • Research data in one of the common stat packages (SAS, SPSS, etc)
      • Technical documentation in a proprietary format (Word, PDF)
      • A proto-SIP in quasi-OAIS terms
      • Minimal level of metadata regarding how the survey was conducted
  • 12. Ingest container – file level
    • Vanilla Fedora Object
      • Will never know what sort of content format to expect
      • Use the RELS-EXT to connect related files
  • 13. Ingest container – deposit
    • Another plain Fedora Object
      • Points to all of the files stored in the file-level objects
      • Relatively little metadata stored for this level of object
  • 14. Ingest container – example
  • 15. Ingest container – example
  • 16. Ingest and the OAIS PDI
    • Reference – unique Fedora PID
    • Fixity – Fedora-generated checksum
    • Provenance – identity of depositor recorded in the DC Datastream
    • Context – original file name captured in the content Datastream
    • Access Rights – terms of deposit
  • 17. Generating OAIS SIPs
    • Original content
      • Normalized version too, if applicable
      • What’s normalization in this context?
    • Preservation Description Information (PDI)
      • As described previously
    • Delivered via SDef/SDep combo
  • 18. Ingest – continued
    • Data
      • Disclosure analysis
      • Recoding
    • Documentation
      • Corrections
      • Clarifications
    • Normalized formats
  • 19. Ingest – finale
    • Packaged into a “study”
      • Data, doc questionnaire, user guide, etc
      • Normalized formats for preservation
      • Convenient formats for access
  • 20. Ingest – finale PID REPORT (test/plain) objectProperties DC RELS-EXT AUDIT icpsr:release-28748-file-3 QUESTIONNAIRE (application/pdf) objectProperties DC RELS-EXT isPartOf: release-15868 AUDIT icpsr:release-28748-file-1 STATA-DICT (text/plain) objectProperties DC RELS-EXT isPartOf: release-15868 AUDIT DATA (text/plain) DDI (text/xml) SAS-SETUPS (text/plain) SPSS-SETUPS (text/plain) STATA-SETUPS (text/plain) icpsr:release-28748-file-2 CODEBOOK (application/pdf) objectProperties DC RELS-EXT isPartOf: release-15868 AUDIT
  • 21. Generating OAIS AIPs
    • For each object (file)
      • Everything from the SIP plus
        • Preservation events
        • Description of the transformation used
        • Preservation commitment
      • Its post-processed version
    • Delivered via SDef/SDep combo
  • 22. Example AIP PID REPORT (test/plain) objectProperties DC RELS-EXT AUDIT icpsr:release-28748-file-3 QUESTIONNAIRE (application/pdf) objectProperties DC RELS-EXT isPartOf: release-15868 AUDIT icpsr:release-28748-file-1 STATA-DICT (text/plain) objectProperties DC RELS-EXT isPartOf: release-15868 AUDIT DATA (text/plain) DDI (text/xml) SAS-SETUPS (text/plain) SPSS-SETUPS (text/plain) STATA-SETUPS (text/plain) icpsr:release-28748-file-2 CODEBOOK (application/pdf) objectProperties DC RELS-EXT isPartOf: release-15868 AUDIT PID objectProperties DC RELS-EXT AUDIT
  • 23. Questions we faced
    • Datastreams or relationships?
    • What about our XML?
    • AIPs or DIPs?
    • How to build FOXML?
  • 24. Datastreams /relationships? PID CONTENT X objectProperties DC RELS-EXT AUDIT PID CONTENT Y objectProperties DC RELS-EXT AUDIT PID CONTENT Y objectProperties DC RELS-EXT AUDIT CONTENT X
  • 25. Our XML
    • DDI v2
      • Contains lots of the information one might expect to find in the DC
    • Strategy
      • Duplicate it
  • 26. AIPs or DIPs
    • Lots of copies
    • Destination
      • Archival Storage remote location
      • Repository for ingest
  • 27. Building FOXML
    • Source
      • Database
      • DDI XML
    • Re-usable tool
  • 28. Special Thanks
    • The Team
    • Peggy Overcashier
    • Nathan Adams
    • Nancy McGovern
    • Mary Vardigan
    • The Funder
    • National Science Foundation Award 0958382
    • INTEROP EAGER program

×