Your SlideShare is downloading. ×
Rule-based Preservation Systems
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Rule-based Preservation Systems


Published on

Published in: Technology, Business

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano { moore , schroede , mwan , sekar , marciano }
  • 2. Data Grids
    • SRB - Storage Resource Broker
      • Persistent naming of distributed data
      • Management of data stored in multiple types of storage systems
      • Organization of data as a shared collection with descriptive metadata, access controls, audit traiils
    • iRODS - integrated Rule-Oriented Data System
      • Rules control execution of remote micro-services
      • Manage persistent state information
      • Validate assertions about collection
      • Automate execution of management policies
  • 3. Extremely Successful
    • Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections
    • Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS
      • Astronomy Data grid
      • Bio-informatics Digital library
      • Earth Sciences Data grid
      • Ecology Collection
      • Education Persistent archive
      • Engineering Digital library
      • Environmental science Data grid
      • High energy physics Data grid
      • Humanities Data Grid
      • Medical community Digital library
      • Oceanography Real time sensor data, persistent archive
      • Seismology Digital library, real-time sensor data
    • Goal has been generic infrastructure for distributed data
  • 4.  
  • 5. Data Grids
    • Data virtualization
      • Provide the persistent, global identifiers needed to manage distributed data
      • Provide standard operations for interacting with heterogeneous storage system
      • Map from storage protocols to preferred clients
    • Trust virtualization
      • Manage authentication and authorization
      • Enable access controls on data, metadata, storage
    • Federation
      • Controlled sharing of name spaces, files, and metadata between independent data grids
      • Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids
  • 6. Production Data Grids: Observations
    • Data grids manage shared collections that are distributed across multiple storage systems and institutions
      • Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment
      • The number of observed problems is proportional to the size of the collections
    • Need to minimize labor costs by automating:
      • Application of management policies
      • Execution of administrative functions for error recovery
      • Validation of preservation assessment criteria
  • 7. Observations of Production Data Grids
    • Each community implements different management polices
    • Need a mechanism to support the socialization of shared collections
      • Community specific preservation objectives
      • Community specific assertions about properties of the shared collection
      • Community specific management policies
  • 8. Collection Management iRODS - integrated Rule-Oriented Data System
  • 9. Rule-based Data Management
    • Map from management policies to rules controlling execution of remote micro-services
    • Manage persistent state information for results of each micro-service execution
    • Support an additional three logical name spaces
      • Rules
      • Micro-services
      • Persistent state information
    • Constitutes representation information for preservation environments
  • 10. Example Rules
    • Rule composed of four parts:
      • Name | condition | micro-service set | recovery
    • Rule to automate replication of data for a specific collection
      • acPostProcForPut |
      • $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) |
      • nop
    • Rule types
      • Internal, administrative, user-defined
      • Atomic, deferred, periodic
  • 11. Management Virtualization
    • Standard policies expressed as rules
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Required preservation metadata - templates
        • Generation of AIPs, DIPS
  • 12. New Capabilities
    • Management capabilities
      • Rules to validate assessment criteria
      • Access controls on rules
      • Time-dependent access controls
      • Access controls on each micro-service
      • Redaction, access controls on structures in a file
      • Rule to parse audit trails, verify consistency of system
    • Data grid evolution
      • Dynamic addition of new rules / micro-services / persistent state information
      • Rules to control migration from old management policies to new management policies
    • Federation
      • Migration of rules and micro-services with data
  • 13. Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • 14. Digital Preservation
    • Preservation is communication with the future
      • How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)?
      • SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology
    • Preservation manages communication from the past
      • What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)?
      • iRODS - integrated Rule-Oriented Data System
  • 15. Theory of Digital Preservation
    •  Definition of the persistent name spaces
    •  Definition of the operations that are performed upon the persistent name spaces
    •  Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation
    •  Characterization of the transformations that are made to the records for each operation
    •  Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set.
    •  Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria.
    •  Demonstration that the persistent state information is complete, enabling the validation of assessment criteria.
    •  The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.
    •  A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity .
  • 16. For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu