Rule-based Preservation Systems


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rule-based Preservation Systems

  1. 1. Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano { moore , schroede , mwan , sekar , marciano }
  2. 2. Data Grids <ul><li>SRB - Storage Resource Broker </li></ul><ul><ul><li>Persistent naming of distributed data </li></ul></ul><ul><ul><li>Management of data stored in multiple types of storage systems </li></ul></ul><ul><ul><li>Organization of data as a shared collection with descriptive metadata, access controls, audit traiils </li></ul></ul><ul><li>iRODS - integrated Rule-Oriented Data System </li></ul><ul><ul><li>Rules control execution of remote micro-services </li></ul></ul><ul><ul><li>Manage persistent state information </li></ul></ul><ul><ul><li>Validate assertions about collection </li></ul></ul><ul><ul><li>Automate execution of management policies </li></ul></ul>
  3. 3. Extremely Successful <ul><li>Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections </li></ul><ul><li>Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS </li></ul><ul><ul><li>Astronomy Data grid </li></ul></ul><ul><ul><li>Bio-informatics Digital library </li></ul></ul><ul><ul><li>Earth Sciences Data grid </li></ul></ul><ul><ul><li>Ecology Collection </li></ul></ul><ul><ul><li>Education Persistent archive </li></ul></ul><ul><ul><li>Engineering Digital library </li></ul></ul><ul><ul><li>Environmental science Data grid </li></ul></ul><ul><ul><li>High energy physics Data grid </li></ul></ul><ul><ul><li>Humanities Data Grid </li></ul></ul><ul><ul><li>Medical community Digital library </li></ul></ul><ul><ul><li>Oceanography Real time sensor data, persistent archive </li></ul></ul><ul><ul><li>Seismology Digital library, real-time sensor data </li></ul></ul><ul><li>Goal has been generic infrastructure for distributed data </li></ul>
  4. 5. Data Grids <ul><li>Data virtualization </li></ul><ul><ul><li>Provide the persistent, global identifiers needed to manage distributed data </li></ul></ul><ul><ul><li>Provide standard operations for interacting with heterogeneous storage system </li></ul></ul><ul><ul><li>Map from storage protocols to preferred clients </li></ul></ul><ul><li>Trust virtualization </li></ul><ul><ul><li>Manage authentication and authorization </li></ul></ul><ul><ul><li>Enable access controls on data, metadata, storage </li></ul></ul><ul><li>Federation </li></ul><ul><ul><li>Controlled sharing of name spaces, files, and metadata between independent data grids </li></ul></ul><ul><ul><li>Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids </li></ul></ul>
  5. 6. Production Data Grids: Observations <ul><li>Data grids manage shared collections that are distributed across multiple storage systems and institutions </li></ul><ul><ul><li>Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment </li></ul></ul><ul><ul><li>The number of observed problems is proportional to the size of the collections </li></ul></ul><ul><li>Need to minimize labor costs by automating: </li></ul><ul><ul><li>Application of management policies </li></ul></ul><ul><ul><li>Execution of administrative functions for error recovery </li></ul></ul><ul><ul><li>Validation of preservation assessment criteria </li></ul></ul>
  6. 7. Observations of Production Data Grids <ul><li>Each community implements different management polices </li></ul><ul><li>Need a mechanism to support the socialization of shared collections </li></ul><ul><ul><li>Community specific preservation objectives </li></ul></ul><ul><ul><li>Community specific assertions about properties of the shared collection </li></ul></ul><ul><ul><li>Community specific management policies </li></ul></ul>
  7. 8. Collection Management iRODS - integrated Rule-Oriented Data System
  8. 9. Rule-based Data Management <ul><li>Map from management policies to rules controlling execution of remote micro-services </li></ul><ul><li>Manage persistent state information for results of each micro-service execution </li></ul><ul><li>Support an additional three logical name spaces </li></ul><ul><ul><li>Rules </li></ul></ul><ul><ul><li>Micro-services </li></ul></ul><ul><ul><li>Persistent state information </li></ul></ul><ul><li>Constitutes representation information for preservation environments </li></ul>
  9. 10. Example Rules <ul><li>Rule composed of four parts: </li></ul><ul><ul><li>Name | condition | micro-service set | recovery </li></ul></ul><ul><li>Rule to automate replication of data for a specific collection </li></ul><ul><ul><li>acPostProcForPut | </li></ul></ul><ul><ul><li>$objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | </li></ul></ul><ul><ul><li>nop </li></ul></ul><ul><li>Rule types </li></ul><ul><ul><li>Internal, administrative, user-defined </li></ul></ul><ul><ul><li>Atomic, deferred, periodic </li></ul></ul>
  10. 11. Management Virtualization <ul><li>Standard policies expressed as rules </li></ul><ul><ul><li>Integrity </li></ul></ul><ul><ul><ul><li>Validation of checksums </li></ul></ul></ul><ul><ul><ul><li>Synchronization of replicas </li></ul></ul></ul><ul><ul><ul><li>Data distribution </li></ul></ul></ul><ul><ul><ul><li>Data retention </li></ul></ul></ul><ul><ul><ul><li>Access controls </li></ul></ul></ul><ul><ul><li>Authenticity </li></ul></ul><ul><ul><ul><li>Chain of custody - audit trails </li></ul></ul></ul><ul><ul><ul><li>Required preservation metadata - templates </li></ul></ul></ul><ul><ul><ul><li>Generation of AIPs, DIPS </li></ul></ul></ul>
  11. 12. New Capabilities <ul><li>Management capabilities </li></ul><ul><ul><li>Rules to validate assessment criteria </li></ul></ul><ul><ul><li>Access controls on rules </li></ul></ul><ul><ul><li>Time-dependent access controls </li></ul></ul><ul><ul><li>Access controls on each micro-service </li></ul></ul><ul><ul><li>Redaction, access controls on structures in a file </li></ul></ul><ul><ul><li>Rule to parse audit trails, verify consistency of system </li></ul></ul><ul><li>Data grid evolution </li></ul><ul><ul><li>Dynamic addition of new rules / micro-services / persistent state information </li></ul></ul><ul><ul><li>Rules to control migration from old management policies to new management policies </li></ul></ul><ul><li>Federation </li></ul><ul><ul><li>Migration of rules and micro-services with data </li></ul></ul>
  12. 13. Federation Between Data Grids <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical rule name space </li></ul><ul><li>Logical micro-service name </li></ul><ul><li>Logical persistent state </li></ul>Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical rule name space </li></ul><ul><li>Logical micro-service name </li></ul><ul><li>Logical persistent state </li></ul>Data Collection A
  13. 14. Digital Preservation <ul><li>Preservation is communication with the future </li></ul><ul><ul><li>How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)? </li></ul></ul><ul><ul><li>SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology </li></ul></ul><ul><li>Preservation manages communication from the past </li></ul><ul><ul><li>What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)? </li></ul></ul><ul><ul><li>iRODS - integrated Rule-Oriented Data System </li></ul></ul>
  14. 15. Theory of Digital Preservation <ul><li> Definition of the persistent name spaces </li></ul><ul><li> Definition of the operations that are performed upon the persistent name spaces </li></ul><ul><li> Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation </li></ul><ul><li> Characterization of the transformations that are made to the records for each operation </li></ul><ul><li> Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set. </li></ul><ul><li> Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria. </li></ul><ul><li> Demonstration that the persistent state information is complete, enabling the validation of assessment criteria. </li></ul><ul><li> The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record. </li></ul><ul><li> A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity . </li></ul>
  15. 16. For More Information <ul><li>Reagan W. Moore </li></ul><ul><li>San Diego Supercomputer Center </li></ul><ul><li>[email_address] edu </li></ul><ul><li> </li></ul><ul><li> </li></ul>