Rule-based Preservation Systems
Upcoming SlideShare
Loading in...5
×
 

Rule-based Preservation Systems

on

  • 420 views

 

Statistics

Views

Total Views
420
Views on SlideShare
420
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Rule-based Preservation Systems Rule-based Preservation Systems Presentation Transcript

  • Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano { moore , schroede , mwan , sekar , marciano }@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
  • Data Grids
    • SRB - Storage Resource Broker
      • Persistent naming of distributed data
      • Management of data stored in multiple types of storage systems
      • Organization of data as a shared collection with descriptive metadata, access controls, audit traiils
    • iRODS - integrated Rule-Oriented Data System
      • Rules control execution of remote micro-services
      • Manage persistent state information
      • Validate assertions about collection
      • Automate execution of management policies
  • Extremely Successful
    • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections
    • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS
      • Astronomy Data grid
      • Bio-informatics Digital library
      • Earth Sciences Data grid
      • Ecology Collection
      • Education Persistent archive
      • Engineering Digital library
      • Environmental science Data grid
      • High energy physics Data grid
      • Humanities Data Grid
      • Medical community Digital library
      • Oceanography Real time sensor data, persistent archive
      • Seismology Digital library, real-time sensor data
    • Goal has been generic infrastructure for distributed data
    View slide
  •   View slide
  • Data Grids
    • Data virtualization
      • Provide the persistent, global identifiers needed to manage distributed data
      • Provide standard operations for interacting with heterogeneous storage system
      • Map from storage protocols to preferred clients
    • Trust virtualization
      • Manage authentication and authorization
      • Enable access controls on data, metadata, storage
    • Federation
      • Controlled sharing of name spaces, files, and metadata between independent data grids
      • Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids
  • Production Data Grids: Observations
    • Data grids manage shared collections that are distributed across multiple storage systems and institutions
      • Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment
      • The number of observed problems is proportional to the size of the collections
    • Need to minimize labor costs by automating:
      • Application of management policies
      • Execution of administrative functions for error recovery
      • Validation of preservation assessment criteria
  • Observations of Production Data Grids
    • Each community implements different management polices
    • Need a mechanism to support the socialization of shared collections
      • Community specific preservation objectives
      • Community specific assertions about properties of the shared collection
      • Community specific management policies
  • Collection Management iRODS - integrated Rule-Oriented Data System
  • Rule-based Data Management
    • Map from management policies to rules controlling execution of remote micro-services
    • Manage persistent state information for results of each micro-service execution
    • Support an additional three logical name spaces
      • Rules
      • Micro-services
      • Persistent state information
    • Constitutes representation information for preservation environments
  • Example Rules
    • Rule composed of four parts:
      • Name | condition | micro-service set | recovery
    • Rule to automate replication of data for a specific collection
      • acPostProcForPut |
      • $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) |
      • nop
    • Rule types
      • Internal, administrative, user-defined
      • Atomic, deferred, periodic
  • Management Virtualization
    • Standard policies expressed as rules
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Required preservation metadata - templates
        • Generation of AIPs, DIPS
  • New Capabilities
    • Management capabilities
      • Rules to validate assessment criteria
      • Access controls on rules
      • Time-dependent access controls
      • Access controls on each micro-service
      • Redaction, access controls on structures in a file
      • Rule to parse audit trails, verify consistency of system
    • Data grid evolution
      • Dynamic addition of new rules / micro-services / persistent state information
      • Rules to control migration from old management policies to new management policies
    • Federation
      • Migration of rules and micro-services with data
  • Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • Digital Preservation
    • Preservation is communication with the future
      • How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)?
      • SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology
    • Preservation manages communication from the past
      • What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)?
      • iRODS - integrated Rule-Oriented Data System
  • Theory of Digital Preservation
    •  Definition of the persistent name spaces
    •  Definition of the operations that are performed upon the persistent name spaces
    •  Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation
    •  Characterization of the transformations that are made to the records for each operation
    •  Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set.
    •  Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria.
    •  Demonstration that the persistent state information is complete, enabling the validation of assessment criteria.
    •  The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.
    •  A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity .
  • For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu
    • http://www.sdsc.edu/srb/
    • http://irods.sdsc.edu/