Multi-Archival Syndicated Storage Platform

  • 468 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
468
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Bryan Beecher University of Michigan Director, Computing & Network Services E: [email_address] W: http://www.icpsr.umich.edu/ICPSR/staff/beecher.html/ Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences E: [email_address] W: http://maltman.hmdc.harvard.edu/
  • 2.
    • Roadmap
      • Why replicate for preservation?
      • What is institutional model for replication in Data-PASS use?
      • How do we build on LOCKSS to support these institutional needs?
    • Collaborators and Conspirators
      • Leonid Andreev, IQSS; Steve Burling, ICPSR; Jonathan Crabtree, Odum; Marc Maynard, Roper; Nancy McGovern, ICPSR;
  • 3.
    • Technical
      • Media failure: storage conditions, media characteristics
      • Format obsolescence
      • Preservation infrastructure software failure
      • Storage infrastructure software failure
      • Storage infrastructure hardware failure
    • External Threats to Institutions
      • Third party attacks
      • Institutional funding
      • Change in legal regimes
    • Quis custodiet ipsos custodes?
      • Unintentional curatorial modification
      • Loss of institutional knowledge & skills
      • Intentional curatorial deaccessioning
      • Change in institutional mission
  • 4.
    • There are potential single points of failure in both technology, organization and legal regimes:
      • Diversify your portfolio: multiple software systems, hardware, organization
      • Find diverse partners – diverse business models, legal regimes
    http://failblog.org/2008/02/08/floppy-fail/
  • 5.
    • Consider organizational credentials
    • No organization is absolutely certain to be reliable
    • Consider the trust relationships across institutions
    http://flickr.com/photos/phauly/35555985 /
  • 6.
    • Policy Driven
      • Institutional policy creates formal replication commitments
      • Replication commitments are described in metadata, using schema
      • Metadata drives
        • Configuration of replication network
        • Auditing of replication network
  • 7.
    • Asymmetric Commitments Partners vary in …
      • … storage commitments to replication
      • … size of holdings being replicated
      • … what holdings of other partners they replicate
  • 8.
    • Completeness
      • Complete public holdings of each partner
      • Retain previous version of holdings
      • Include
        • metadata
        • data
        • documentation
        • legal agreements
  • 9.
    • Restoration guarantees
      • Restore groups of versioned content
        • to owning archive
        • to replication hosts
      • Institutional failure restoration: – transfer entire holdings of an archive to another
  • 10.
    • Trust & Verification
      • Each partner is trusted …
        • … to hold the public content of other (not to disseminate improperly)
        • … to add units to be harvested
      • No partner is trusted to be “super-user”
        • No deletion (or directly manipulation of replication storage owned by another partner
      • Legal agreements reinforce trust model
      • Schema based auditing used to …
        • … verify replication guarantees are met
        • … record replication and storage commitments
        • … document related TRAC criteria
  • 11.
    • Network level:
      • Identification: name; description; contact; access point URI
      • Capabilities: protocol version; number of replicates maintained; replication frequency; versioning/deletion support
      • Human readable documentation: restrictions on content that may be placed in the network; services guaranteed by the network; Virtual Organization policies relating to network maintenance
    • Host level
      • Identification: name; description; contact; access point URI
      • Capabilities: protocol version; storage available
      • Human readable terms of use: Documentation of hardware, software and operating personnel in support of TRAC criteria 
    • Archival unit level
      • Identification: name; description; contact; access point URI
      • Attributes: update frequency, plugin required for harvesting, storage required
      • Terms of use: Required statement of content compliance with network terms. ; Dissemination terms and conditions
    • TRAC Integration
      • A number of elements comprise documentation showing how the replication system itself supports relevant TRAC criteria
      • Other elements that may be use to include text, or reference external text that documents evidence of compliance with TRAC criteria.
      • Specific TRAC criteria are identified implicitly, can be explicitly identified with attributes
      • Schema documentation describes each elements relevance to TRAC, and mapping to particular TRAC criteria
  • 12.
      • Initialization: Given schema instance distribute AU harvesting responsibility to hosts
      • Auditing: Does current host harvesting allocation & history match replication commitment in schema?
      • Recovery of hosts
      • Deliver AU content to source archive
      • Addition of AU’s, hosts
      • Growth of AU’s over initial commitment
      • Assumptions
      • Nothing is deleted
      • Resources in network grow monotonically
      • Off-the-path behavior is detected automatically, resolved manually
  • 13.  
  • 14.  
  • 15.
    • LOCKSS
    • CLOCKSS
    • Very easy to build and deploy
      • 5 minutes
    • Very easy to plug into public LOCKSS network
      • 5 minutes
    • Very easy to manage thereafter – it is basically an appliance
    • Also easy to set-up
    • Grouping your CLOCKSS devices into a private network and paring the 20k-line configuration file into the right 200-line configuration file is not
    • Managing a network now, not a device
  • 16.
    • Standard LOCKSS used for
      • Harvesting
      • Recovery
    • New LOCKSS bulk update mechanism used for
      • Initial configuration
      • Adding AU’s
    • CLOCKSS mechanisms (certificates, cache monitor)
      • Content delivery
      • Optimize recovery
      • Auditing
    • Data-PASS customizations for schema processing
      • Translating schema instance into bulk update requests
      • Reporting on compliance based on cache monitor database
  • 17.
    • Summer 2007: Attended the MetaArchive LOCKSS tutorial
      • Very good overview of LOCKSS
    • Summer 2007: SSP System Requirements Developed & Approved
    • Winter 2007: First public LOCKSS network nodes built at two Data-PASS sites
    • Winter 2007: SSP Replication Commitment Schema Developed
    • Spring 2008: Completed Test harvest of MRA collection into LOCKSS
    • Sprint 2008: SSP System Use Cases Developed
    • Spring 2008: Prototype plugin developed to harvest Dataverse Networks
    • Spring 2008: Data-PASS sites are joined into single Private LOCKSS Network (PLN)
    • Spring 2008: Met with LOCKSS developers to review use cases
      • SSP will leverage functionality “in the works” by LOCKSS team
  • 18.  
  • 19.
      • Replication ameliorates institutional risks to preservation
      • Data PASS requires policy based, auditable, asymmetric replication commitments
      • Formalize policy in schema
      • (Re)Configure & audit LOCKSS using schema
      • Replication uses standard LOCKSS mechanisms