Your SlideShare is downloading. ×
Rule-based Preservation Systems
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Rule-based Preservation Systems


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano { moore , schroede , mwan , sekar , marciano }
  • 2. Data Grids
    • SRB - Storage Resource Broker
      • Persistent naming of distributed data
      • Management of data stored in multiple types of storage systems
      • Organization of data as a shared collection with descriptive metadata, access controls, audit traiils
    • iRODS - integrated Rule-Oriented Data System
      • Rules control execution of remote micro-services
      • Manage persistent state information
      • Validate assertions about collection
      • Automate execution of management policies
  • 3. Extremely Successful
    • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections
    • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS
      • Astronomy Data grid
      • Bio-informatics Digital library
      • Earth Sciences Data grid
      • Ecology Collection
      • Education Persistent archive
      • Engineering Digital library
      • Environmental science Data grid
      • High energy physics Data grid
      • Humanities Data Grid
      • Medical community Digital library
      • Oceanography Real time sensor data, persistent archive
      • Seismology Digital library, real-time sensor data
    • Goal has been generic infrastructure for distributed data
  • 4.  
  • 5. Data Grids
    • Data virtualization
      • Provide the persistent, global identifiers needed to manage distributed data
      • Provide standard operations for interacting with heterogeneous storage system
      • Map from storage protocols to preferred clients
    • Trust virtualization
      • Manage authentication and authorization
      • Enable access controls on data, metadata, storage
    • Federation
      • Controlled sharing of name spaces, files, and metadata between independent data grids
      • Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids
  • 6. Production Data Grids: Observations
    • Data grids manage shared collections that are distributed across multiple storage systems and institutions
      • Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment
      • The number of observed problems is proportional to the size of the collections
    • Need to minimize labor costs by automating:
      • Application of management policies
      • Execution of administrative functions for error recovery
      • Validation of preservation assessment criteria
  • 7. Observations of Production Data Grids
    • Each community implements different management polices
    • Need a mechanism to support the socialization of shared collections
      • Community specific preservation objectives
      • Community specific assertions about properties of the shared collection
      • Community specific management policies
  • 8. Collection Management iRODS - integrated Rule-Oriented Data System
  • 9. Rule-based Data Management
    • Map from management policies to rules controlling execution of remote micro-services
    • Manage persistent state information for results of each micro-service execution
    • Support an additional three logical name spaces
      • Rules
      • Micro-services
      • Persistent state information
    • Constitutes representation information for preservation environments
  • 10. Example Rules
    • Rule composed of four parts:
      • Name | condition | micro-service set | recovery
    • Rule to automate replication of data for a specific collection
      • acPostProcForPut |
      • $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) |
      • nop
    • Rule types
      • Internal, administrative, user-defined
      • Atomic, deferred, periodic
  • 11. Management Virtualization
    • Standard policies expressed as rules
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Required preservation metadata - templates
        • Generation of AIPs, DIPS
  • 12. New Capabilities
    • Management capabilities
      • Rules to validate assessment criteria
      • Access controls on rules
      • Time-dependent access controls
      • Access controls on each micro-service
      • Redaction, access controls on structures in a file
      • Rule to parse audit trails, verify consistency of system
    • Data grid evolution
      • Dynamic addition of new rules / micro-services / persistent state information
      • Rules to control migration from old management policies to new management policies
    • Federation
      • Migration of rules and micro-services with data
  • 13. Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • 14. Digital Preservation
    • Preservation is communication with the future
      • How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)?
      • SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology
    • Preservation manages communication from the past
      • What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)?
      • iRODS - integrated Rule-Oriented Data System
  • 15. Theory of Digital Preservation
    •  Definition of the persistent name spaces
    •  Definition of the operations that are performed upon the persistent name spaces
    •  Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation
    •  Characterization of the transformations that are made to the records for each operation
    •  Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set.
    •  Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria.
    •  Demonstration that the persistent state information is complete, enabling the validation of assessment criteria.
    •  The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.
    •  A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity .
  • 16. For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu