Rule-Based Data Management Systems


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rule-Based Data Management Systems

  1. 1. Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar { moore , schroede , mwan , sekar }
  2. 2. Topics <ul><li>Managing distributed shared collections </li></ul><ul><ul><li>Data grids </li></ul></ul><ul><li>Control of name spaces - SRB </li></ul><ul><ul><li>Production system </li></ul></ul><ul><ul><li>Data and trust virtualization </li></ul></ul><ul><ul><li>Infrastructure independence </li></ul></ul><ul><li>Control of management policies - iRODS </li></ul><ul><ul><li>Next generation technology </li></ul></ul><ul><ul><li>Management virtualization </li></ul></ul><ul><ul><li>Rules controlling remote operations </li></ul></ul><ul><ul><li>Constraints on the rules and remote operations </li></ul></ul>
  3. 3. Data Management Applications <ul><li>Data grids </li></ul><ul><ul><li>Share data </li></ul></ul><ul><li>Digital libraries </li></ul><ul><ul><li>Publish data </li></ul></ul><ul><li>Persistent archives </li></ul><ul><ul><li>Preserve data </li></ul></ul><ul><li>Real-time sensor streams </li></ul><ul><ul><li>Data federation </li></ul></ul><ul><li>Data analysis </li></ul><ul><ul><li>Automate access to distributed data </li></ul></ul>
  4. 4. Concepts <ul><li>Distributed Data Management Concepts </li></ul><ul><ul><li>Data virtualization </li></ul></ul><ul><ul><ul><li>Manage the properties of a shared collection independently of the storage systems </li></ul></ul></ul><ul><ul><li>Trust virtualization </li></ul></ul><ul><ul><ul><li>Administrative domain independence </li></ul></ul></ul><ul><ul><li>Federation </li></ul></ul><ul><ul><ul><li>Managing interactions between data grids </li></ul></ul></ul><ul><li>Rule-based Data Management </li></ul><ul><ul><li>Policy virtualization </li></ul></ul><ul><ul><ul><li>Automating execution of management policies </li></ul></ul></ul><ul><ul><ul><li>Applying management policies to remote operations </li></ul></ul></ul>
  5. 5. Using a Data Grid – in Abstract Data Grid <ul><li>User asks for data from the data grid </li></ul>Ask for data Data delivered <ul><li>The data is found and returned </li></ul><ul><ul><li>Where & how details are hidden </li></ul></ul>
  6. 6. Using a Data Grid - Details <ul><li>Data request goes to SRB Server </li></ul><ul><li>Server looks up information in catalog </li></ul><ul><li>Catalog tells which SRB server has data </li></ul><ul><li>1 st server asks 2 nd for data </li></ul><ul><li>The data is found and returned </li></ul><ul><li>User asks for data </li></ul>Storage Resource Broker Server Storage Resource Broker Server Metadata Catalog DB
  7. 7. Data Virtualization <ul><li>Manage properties of each digital entity independently of the remote storage systems </li></ul><ul><ul><li>Infrastructure independence </li></ul></ul><ul><li>Properties of the shared collection </li></ul><ul><ul><li>Name spaces </li></ul></ul><ul><ul><li>Persistent state information (location, size,…) </li></ul></ul><ul><li>Manage standard operations </li></ul><ul><ul><li>Map from client requests to standard operations </li></ul></ul><ul><ul><li>Map from standard operations to remote storage system protocol </li></ul></ul>
  8. 8. Data Virtualization <ul><li>Storage Repository </li></ul><ul><li>Storage location </li></ul><ul><li>User name </li></ul><ul><li>File name </li></ul><ul><li>File context (creation date,…) </li></ul><ul><li>Access controls </li></ul><ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical context (metadata) </li></ul><ul><li>Access constraints </li></ul>Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection
  9. 9. Data Virtualization Storage System Storage Protocol Access Interface Standard Access Actions Data Grid Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Standard Micro-services
  10. 10. Standard Operations <ul><li>File manipulation </li></ul><ul><ul><li>Posix I/O calls - open, close, read, write, seek, … </li></ul></ul><ul><ul><li>Register, replicate, checksum, synchronize </li></ul></ul><ul><li>Bulk operations </li></ul><ul><ul><li>Bulk data transport, metadata load </li></ul></ul><ul><ul><li>Parallel I/O streams </li></ul></ul><ul><li>Remote procedures </li></ul><ul><ul><li>Data filtering, subsetting, metadata extraction </li></ul></ul><ul><ul><li>Remote library execution (HDFv5, DataCutter) </li></ul></ul>
  11. 11. BaBar High-Energy Physics <ul><li>Stanford Linear Accelerator </li></ul><ul><li>IN2P3 </li></ul><ul><li>Lyon, France </li></ul><ul><li>Rome, Italy </li></ul><ul><li>San Diego </li></ul><ul><li>RAL, UK </li></ul><ul><li>A functioning international Data Grid for high-energy physics </li></ul>Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day
  12. 12. Next Generation Technology <ul><li>Every fault that occurs in the distributed environment is the responsibility of the data grid </li></ul><ul><ul><li>Network outage / system crash / operator error </li></ul></ul><ul><ul><li>Minimize risk through checksums, replicas, synchronization, federation </li></ul></ul><ul><li>Management of large collections is labor intensive </li></ul><ul><ul><li>Initiation of recovery operations after remote system failure </li></ul></ul><ul><li>Need to automate execution of management policies </li></ul>
  13. 13. Controlling Remote Operations iRODS - integrated Rule-Oriented Data System Support unique organizational / social management policies for each collection
  14. 14. Rule-based Data Management <ul><li>Express assessment criteria through sets of required persistent state information </li></ul><ul><li>Express management policies as sets of rules controlling the execution of micro-services </li></ul><ul><li>Express capabilities as sets of micro-services </li></ul><ul><ul><li>Manage persistent state information resulting from the application of rules controlling execution of remote micro-services </li></ul></ul>
  15. 15. Management Virtualization <ul><li>Examples of management policies </li></ul><ul><ul><li>Integrity </li></ul></ul><ul><ul><ul><li>Validation of checksums </li></ul></ul></ul><ul><ul><ul><li>Synchronization of replicas </li></ul></ul></ul><ul><ul><ul><li>Data distribution </li></ul></ul></ul><ul><ul><ul><li>Data retention </li></ul></ul></ul><ul><ul><ul><li>Access controls </li></ul></ul></ul><ul><ul><li>Authenticity </li></ul></ul><ul><ul><ul><li>Chain of custody - audit trails </li></ul></ul></ul><ul><ul><ul><li>Track required preservation metadata - templates </li></ul></ul></ul><ul><ul><ul><li>Generation of Archival Information Packages </li></ul></ul></ul>
  16. 16. Rule-based Data Management <ul><li>Rules required for standard operations </li></ul><ul><ul><li>Posix I/O control </li></ul></ul><ul><ul><li>Standard SRB operations </li></ul></ul><ul><li>Administrator controlled rules to implement management policies </li></ul><ul><ul><li>Administrative - adding / deleting users, resources </li></ul></ul><ul><ul><li>Data ingestion - pre-processing, post-processing </li></ul></ul><ul><ul><li>Data transport / deletion - parallel I/O streams, disposition </li></ul></ul><ul><li>User-defined rules, create your own server-side workflow </li></ul><ul><ul><li>Rule set for a particular collection, particular user group, particular storage system, particular micro-service </li></ul></ul>
  17. 17. iRODS Rule <ul><li>Each rule defines </li></ul><ul><ul><li>Event </li></ul></ul><ul><ul><li>Condition </li></ul></ul><ul><ul><li>Action sets (micro-services and rules) </li></ul></ul><ul><ul><li>Recovery sets </li></ul></ul><ul><li>Rule types </li></ul><ul><ul><li>Atomic, applied immediately </li></ul></ul><ul><ul><li>Deferred, support deferred consistent constraints </li></ul></ul><ul><ul><li>Periodic, typically used to validate assertions </li></ul></ul>
  18. 18. Rule-based Access <ul><li>Associate security policies with each digital entity </li></ul><ul><ul><li>Redaction, access controls on structures within a file </li></ul></ul><ul><ul><li>Time-dependent access controls (how long to hold data proprietary) </li></ul></ul><ul><li>Associate access controls with each rule </li></ul><ul><ul><li>Restrict ability to modify, apply rules </li></ul></ul><ul><li>Associate access controls with each micro-service </li></ul><ul><ul><li>Explicit control of operation execution within a given collection </li></ul></ul><ul><ul><li>Much finer control than provided by Unix r:w:e </li></ul></ul>
  19. 19. Federation Between Data Grids <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical rule name space </li></ul><ul><li>Logical micro-service name </li></ul><ul><li>Logical persistent state </li></ul>Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical rule name space </li></ul><ul><li>Logical micro-service name </li></ul><ul><li>Logical persistent state </li></ul>Data Collection A
  20. 20. Rule-based Federation <ul><li>When registering a digital entity into another data grid, register required management rules along with the digital entity </li></ul><ul><ul><li>Move management policies with data </li></ul></ul><ul><li>Expectation that each operation on each digital entity can be controlled across federated data grids </li></ul><ul><ul><li>Example is end-to-end encryption </li></ul></ul>
  21. 21. Evolution of Rule-based Systems <ul><li>Logical name spaces enable dynamic addition of new rules, micro-services, and state information </li></ul><ul><ul><li>Apply new rules on one collection while applying old rule sets on a legacy collection </li></ul></ul><ul><ul><li>Can run old and new rule sets in parallel </li></ul></ul><ul><li>Can build a system that manages its evolution </li></ul><ul><ul><li>Can create rules that track the evolution of the rule-based system </li></ul></ul><ul><ul><li>Can create rules that govern migration to new rule sets </li></ul></ul>
  22. 22. Assessment Rules <ul><li>Can build a system that monitors its own state information </li></ul><ul><ul><li>Parse audit trails to verify accesses by authorized persons </li></ul></ul><ul><ul><li>Parse persistent state information for compliance with management rules </li></ul></ul><ul><ul><li>Test micro-services for compliance with rules </li></ul></ul><ul><ul><li>Audit all accesses to a collection </li></ul></ul><ul><ul><li>Compare system properties to desired outcomes </li></ul></ul>
  23. 23. For More Information <ul><li>Reagan W. Moore </li></ul><ul><li>San Diego Supercomputer Center </li></ul><ul><li>[email_address] edu </li></ul><ul><li>SRB: </li></ul><ul><li>iRODS: </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.