Rule-Based Data Management Systems
Upcoming SlideShare
Loading in...5
×
 

Rule-Based Data Management Systems

on

  • 829 views

 

Statistics

Views

Total Views
829
Views on SlideShare
829
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Rule-Based Data Management Systems Rule-Based Data Management Systems Presentation Transcript

  • Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar { moore , schroede , mwan , sekar }@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
  • Topics
    • Managing distributed shared collections
      • Data grids
    • Control of name spaces - SRB
      • Production system
      • Data and trust virtualization
      • Infrastructure independence
    • Control of management policies - iRODS
      • Next generation technology
      • Management virtualization
      • Rules controlling remote operations
      • Constraints on the rules and remote operations
  • Data Management Applications
    • Data grids
      • Share data
    • Digital libraries
      • Publish data
    • Persistent archives
      • Preserve data
    • Real-time sensor streams
      • Data federation
    • Data analysis
      • Automate access to distributed data
    View slide
  • Concepts
    • Distributed Data Management Concepts
      • Data virtualization
        • Manage the properties of a shared collection independently of the storage systems
      • Trust virtualization
        • Administrative domain independence
      • Federation
        • Managing interactions between data grids
    • Rule-based Data Management
      • Policy virtualization
        • Automating execution of management policies
        • Applying management policies to remote operations
    View slide
  • Using a Data Grid – in Abstract Data Grid
    • User asks for data from the data grid
    Ask for data Data delivered
    • The data is found and returned
      • Where & how details are hidden
  • Using a Data Grid - Details
    • Data request goes to SRB Server
    • Server looks up information in catalog
    • Catalog tells which SRB server has data
    • 1 st server asks 2 nd for data
    • The data is found and returned
    • User asks for data
    Storage Resource Broker Server Storage Resource Broker Server Metadata Catalog DB
  • Data Virtualization
    • Manage properties of each digital entity independently of the remote storage systems
      • Infrastructure independence
    • Properties of the shared collection
      • Name spaces
      • Persistent state information (location, size,…)
    • Manage standard operations
      • Map from client requests to standard operations
      • Map from standard operations to remote storage system protocol
  • Data Virtualization
    • Storage Repository
    • Storage location
    • User name
    • File name
    • File context (creation date,…)
    • Access controls
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical context (metadata)
    • Access constraints
    Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection
  • Data Virtualization Storage System Storage Protocol Access Interface Standard Access Actions Data Grid Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Standard Micro-services
  • Standard Operations
    • File manipulation
      • Posix I/O calls - open, close, read, write, seek, …
      • Register, replicate, checksum, synchronize
    • Bulk operations
      • Bulk data transport, metadata load
      • Parallel I/O streams
    • Remote procedures
      • Data filtering, subsetting, metadata extraction
      • Remote library execution (HDFv5, DataCutter)
  • BaBar High-Energy Physics
    • Stanford Linear Accelerator
    • IN2P3
    • Lyon, France
    • Rome, Italy
    • San Diego
    • RAL, UK
    • A functioning international Data Grid for high-energy physics
    Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day
  • Next Generation Technology
    • Every fault that occurs in the distributed environment is the responsibility of the data grid
      • Network outage / system crash / operator error
      • Minimize risk through checksums, replicas, synchronization, federation
    • Management of large collections is labor intensive
      • Initiation of recovery operations after remote system failure
    • Need to automate execution of management policies
  • Controlling Remote Operations iRODS - integrated Rule-Oriented Data System Support unique organizational / social management policies for each collection
  • Rule-based Data Management
    • Express assessment criteria through sets of required persistent state information
    • Express management policies as sets of rules controlling the execution of micro-services
    • Express capabilities as sets of micro-services
      • Manage persistent state information resulting from the application of rules controlling execution of remote micro-services
  • Management Virtualization
    • Examples of management policies
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Track required preservation metadata - templates
        • Generation of Archival Information Packages
  • Rule-based Data Management
    • Rules required for standard operations
      • Posix I/O control
      • Standard SRB operations
    • Administrator controlled rules to implement management policies
      • Administrative - adding / deleting users, resources
      • Data ingestion - pre-processing, post-processing
      • Data transport / deletion - parallel I/O streams, disposition
    • User-defined rules, create your own server-side workflow
      • Rule set for a particular collection, particular user group, particular storage system, particular micro-service
  • iRODS Rule
    • Each rule defines
      • Event
      • Condition
      • Action sets (micro-services and rules)
      • Recovery sets
    • Rule types
      • Atomic, applied immediately
      • Deferred, support deferred consistent constraints
      • Periodic, typically used to validate assertions
  • Rule-based Access
    • Associate security policies with each digital entity
      • Redaction, access controls on structures within a file
      • Time-dependent access controls (how long to hold data proprietary)
    • Associate access controls with each rule
      • Restrict ability to modify, apply rules
    • Associate access controls with each micro-service
      • Explicit control of operation execution within a given collection
      • Much finer control than provided by Unix r:w:e
  • Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • Rule-based Federation
    • When registering a digital entity into another data grid, register required management rules along with the digital entity
      • Move management policies with data
    • Expectation that each operation on each digital entity can be controlled across federated data grids
      • Example is end-to-end encryption
  • Evolution of Rule-based Systems
    • Logical name spaces enable dynamic addition of new rules, micro-services, and state information
      • Apply new rules on one collection while applying old rule sets on a legacy collection
      • Can run old and new rule sets in parallel
    • Can build a system that manages its evolution
      • Can create rules that track the evolution of the rule-based system
      • Can create rules that govern migration to new rule sets
  • Assessment Rules
    • Can build a system that monitors its own state information
      • Parse audit trails to verify accesses by authorized persons
      • Parse persistent state information for compliance with management rules
      • Test micro-services for compliance with rules
      • Audit all accesses to a collection
      • Compare system properties to desired outcomes
  • For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu
    • SRB: http://www.sdsc.edu/srb/
    • iRODS: http://irods.sdsc.edu/