Rule-based Preservation Systems

  • 285 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
285
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano { moore , schroede , mwan , sekar , marciano }@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
  • 2. Data Grids
    • SRB - Storage Resource Broker
      • Persistent naming of distributed data
      • Management of data stored in multiple types of storage systems
      • Organization of data as a shared collection with descriptive metadata, access controls, audit traiils
    • iRODS - integrated Rule-Oriented Data System
      • Rules control execution of remote micro-services
      • Manage persistent state information
      • Validate assertions about collection
      • Automate execution of management policies
  • 3. Extremely Successful
    • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections
    • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS
      • Astronomy Data grid
      • Bio-informatics Digital library
      • Earth Sciences Data grid
      • Ecology Collection
      • Education Persistent archive
      • Engineering Digital library
      • Environmental science Data grid
      • High energy physics Data grid
      • Humanities Data Grid
      • Medical community Digital library
      • Oceanography Real time sensor data, persistent archive
      • Seismology Digital library, real-time sensor data
    • Goal has been generic infrastructure for distributed data
  • 4.  
  • 5. Data Grids
    • Data virtualization
      • Provide the persistent, global identifiers needed to manage distributed data
      • Provide standard operations for interacting with heterogeneous storage system
      • Map from storage protocols to preferred clients
    • Trust virtualization
      • Manage authentication and authorization
      • Enable access controls on data, metadata, storage
    • Federation
      • Controlled sharing of name spaces, files, and metadata between independent data grids
      • Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids
  • 6. Production Data Grids: Observations
    • Data grids manage shared collections that are distributed across multiple storage systems and institutions
      • Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment
      • The number of observed problems is proportional to the size of the collections
    • Need to minimize labor costs by automating:
      • Application of management policies
      • Execution of administrative functions for error recovery
      • Validation of preservation assessment criteria
  • 7. Observations of Production Data Grids
    • Each community implements different management polices
    • Need a mechanism to support the socialization of shared collections
      • Community specific preservation objectives
      • Community specific assertions about properties of the shared collection
      • Community specific management policies
  • 8. Collection Management iRODS - integrated Rule-Oriented Data System
  • 9. Rule-based Data Management
    • Map from management policies to rules controlling execution of remote micro-services
    • Manage persistent state information for results of each micro-service execution
    • Support an additional three logical name spaces
      • Rules
      • Micro-services
      • Persistent state information
    • Constitutes representation information for preservation environments
  • 10. Example Rules
    • Rule composed of four parts:
      • Name | condition | micro-service set | recovery
    • Rule to automate replication of data for a specific collection
      • acPostProcForPut |
      • $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) |
      • nop
    • Rule types
      • Internal, administrative, user-defined
      • Atomic, deferred, periodic
  • 11. Management Virtualization
    • Standard policies expressed as rules
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Required preservation metadata - templates
        • Generation of AIPs, DIPS
  • 12. New Capabilities
    • Management capabilities
      • Rules to validate assessment criteria
      • Access controls on rules
      • Time-dependent access controls
      • Access controls on each micro-service
      • Redaction, access controls on structures in a file
      • Rule to parse audit trails, verify consistency of system
    • Data grid evolution
      • Dynamic addition of new rules / micro-services / persistent state information
      • Rules to control migration from old management policies to new management policies
    • Federation
      • Migration of rules and micro-services with data
  • 13. Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • 14. Digital Preservation
    • Preservation is communication with the future
      • How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)?
      • SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology
    • Preservation manages communication from the past
      • What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)?
      • iRODS - integrated Rule-Oriented Data System
  • 15. Theory of Digital Preservation
    •  Definition of the persistent name spaces
    •  Definition of the operations that are performed upon the persistent name spaces
    •  Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation
    •  Characterization of the transformations that are made to the records for each operation
    •  Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set.
    •  Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria.
    •  Demonstration that the persistent state information is complete, enabling the validation of assessment criteria.
    •  The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.
    •  A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity .
  • 16. For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu
    • http://www.sdsc.edu/srb/
    • http://irods.sdsc.edu/