Moore RDAP11 Policy-based Data Management


Published on

Reagan Moore, UNC-RENCI; Policy-based Data Management; RDAP11 Summit

The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Moore RDAP11 Policy-based Data Management

  1. 1. Policy Based Data Management Micah Altman – Harvard Monica Omodei – Australian National Data Service Eliot Metsger – Johns Hopkins University Reagan Moore – UNC-CH
  2. 2. Policy Based Data Management Reagan W. Moore Arcot Rajasekar Mike Wan Wayne Schroeder Mike Conway Jason Coposky {moore,sekar,mwan, schroeder} [email_address]
  3. 3. Data Processing Pipeline Preservation Environment Ocean Observatories Initiative NARA Transcontinental Persistent Archive Prototype Carolina Digital Repository Large Synoptic Survey Telescope Digital Library Texas Digital Library French National Library Data Grid Teragrid Temporal Dynamics of Learning Center Australian Research Collaboration Service Taiwan National Archive
  4. 4. Policy-based Data Environments <ul><li>Purpose - reason a collection is assembled </li></ul><ul><li>Properties - attributes needed to ensure the purpose </li></ul><ul><li>Policies - controls for enforcing desired properties, </li></ul><ul><li> mapped to computer actionable rules </li></ul><ul><li>Procedures - functions that implement the policies </li></ul><ul><li>mapped to computer executable workflows </li></ul><ul><li>State information - results of applying the procedures </li></ul><ul><li>mapped to system metadata </li></ul><ul><li>Assessment criteria - validation that state information conforms to the desired purpose </li></ul><ul><li> mapped to periodically executed policies </li></ul>
  5. 5. Data Life Cycle Project Collection Private Local Policy Data Grid Shared Distribution Policy Digital Library Published Description Policy Data Processing Pipeline Analyzed Service Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies for a broader community Virtualize the stages of the data life cycle through policy evolution Consensus of a broader community at each data life cycle stage
  6. 6. Build Shared Collections <ul><li>Enable data processing pipeline that spans multiple institutions </li></ul><ul><li>Enable collaborative research on a shared collection </li></ul><ul><li>Enable formation of a digital library that incorporates material from multiple sources </li></ul><ul><li>Enable creation of a preservation environment that incorporates a deep archive </li></ul>
  7. 7. Policy Based Interoperability
  8. 8. Generic Infrastructure <ul><li>Clients – specific to discipline and life cycle state </li></ul><ul><li>Policies – specific to discipline </li></ul><ul><li>Procedures – specific to discipline </li></ul><ul><li>Remaining infrastructure is generic </li></ul><ul><ul><li>Network transport </li></ul></ul><ul><ul><li>Authentication / Authorization </li></ul></ul><ul><ul><li>Distributed storage access </li></ul></ul><ul><ul><li>Remote execution </li></ul></ul><ul><ul><li>Metadata management </li></ul></ul><ul><ul><li>Message passing </li></ul></ul><ul><ul><li>Rule engine </li></ul></ul>
  9. 9. Data Virtualization Storage System Storage Protocol Access Interface Policy Enforcement Points Standard Micro-services Map from the actions requested by the client to multiple policy enforcement points. Map from policy to standard micro-services. Map from micro-services to standard Posix I/O operations. Map standard I/O operations to the protocol supported by the storage system Standard I/O Operations Data Grid
  10. 10. iRODS - Policy-based Data Management <ul><li>Turn policies into computer actionable rules </li></ul><ul><ul><li>Constrain application of policies by user group, storage resource, file type, file size, processing flag, system property, time dependence </li></ul></ul><ul><li>Compose rules by chaining standard operations </li></ul><ul><ul><li>Standard operations (micro-services) executed at the remote storage location </li></ul></ul><ul><li>Manage state information as attributes on name spaces: </li></ul><ul><ul><li>Files / collections /users / resources / rules </li></ul></ul><ul><li>Validate assessment criteria </li></ul><ul><ul><li>Queries on state information, parsing of audit trails </li></ul></ul><ul><li>Automate administrative functions </li></ul><ul><ul><li>Minimize labor costs </li></ul></ul>
  11. 11. Administrative Policies <ul><li>Authenticity / required metadata </li></ul><ul><li>Integrity / replication </li></ul><ul><li>Distribution / migration / load leveling </li></ul><ul><li>Retention / disposition </li></ul><ul><li>Ingestion / redaction </li></ul><ul><li>Access controls / storage quotas </li></ul><ul><li>Audit trails / arrangement </li></ul><ul><li>Remote execution / deferred rules </li></ul>
  12. 12. Open Source Software <ul><li>Community driven software development </li></ul><ul><ul><li>Focus on features required by user communities </li></ul></ul><ul><ul><li>Focus on bug-free software </li></ul></ul><ul><ul><li>Focus on highly reliable software </li></ul></ul><ul><ul><li>Focus on highly extensible software </li></ul></ul><ul><ul><li>Approximately 3-4 software releases per year </li></ul></ul><ul><li>Distributed under a BSD license </li></ul><ul><ul><li>International collaborations on software development </li></ul></ul><ul><ul><li>IN2P3 (France), SHAMAN (UK), ARCS (Australia), Academia Sinica (Taiwan) </li></ul></ul>
  13. 13. <ul><li>iRODS is a &quot;coordinated NSF/OCI-Nat'l Archives research activity&quot; under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2011 Budget Supplement in the area of Human and Computer Interaction Information Management technology research. </li></ul><ul><li>Reagan W. Moore </li></ul><ul><li>[email_address] </li></ul><ul><li> </li></ul>NSF OCI-0848296 “NARA Transcontinental Persistent Archives Prototype” NSF SDCI-0721400 “Data Grids for Community Driven Applications”