• Like
Rule-Based Data Management Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Rule-Based Data Management Systems

  • 563 views
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
563
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar { moore , schroede , mwan , sekar }@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
  • 2. Topics
    • Managing distributed shared collections
      • Data grids
    • Control of name spaces - SRB
      • Production system
      • Data and trust virtualization
      • Infrastructure independence
    • Control of management policies - iRODS
      • Next generation technology
      • Management virtualization
      • Rules controlling remote operations
      • Constraints on the rules and remote operations
  • 3. Data Management Applications
    • Data grids
      • Share data
    • Digital libraries
      • Publish data
    • Persistent archives
      • Preserve data
    • Real-time sensor streams
      • Data federation
    • Data analysis
      • Automate access to distributed data
  • 4. Concepts
    • Distributed Data Management Concepts
      • Data virtualization
        • Manage the properties of a shared collection independently of the storage systems
      • Trust virtualization
        • Administrative domain independence
      • Federation
        • Managing interactions between data grids
    • Rule-based Data Management
      • Policy virtualization
        • Automating execution of management policies
        • Applying management policies to remote operations
  • 5. Using a Data Grid – in Abstract Data Grid
    • User asks for data from the data grid
    Ask for data Data delivered
    • The data is found and returned
      • Where & how details are hidden
  • 6. Using a Data Grid - Details
    • Data request goes to SRB Server
    • Server looks up information in catalog
    • Catalog tells which SRB server has data
    • 1 st server asks 2 nd for data
    • The data is found and returned
    • User asks for data
    Storage Resource Broker Server Storage Resource Broker Server Metadata Catalog DB
  • 7. Data Virtualization
    • Manage properties of each digital entity independently of the remote storage systems
      • Infrastructure independence
    • Properties of the shared collection
      • Name spaces
      • Persistent state information (location, size,…)
    • Manage standard operations
      • Map from client requests to standard operations
      • Map from standard operations to remote storage system protocol
  • 8. Data Virtualization
    • Storage Repository
    • Storage location
    • User name
    • File name
    • File context (creation date,…)
    • Access controls
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical context (metadata)
    • Access constraints
    Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection
  • 9. Data Virtualization Storage System Storage Protocol Access Interface Standard Access Actions Data Grid Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Standard Micro-services
  • 10. Standard Operations
    • File manipulation
      • Posix I/O calls - open, close, read, write, seek, …
      • Register, replicate, checksum, synchronize
    • Bulk operations
      • Bulk data transport, metadata load
      • Parallel I/O streams
    • Remote procedures
      • Data filtering, subsetting, metadata extraction
      • Remote library execution (HDFv5, DataCutter)
  • 11. BaBar High-Energy Physics
    • Stanford Linear Accelerator
    • IN2P3
    • Lyon, France
    • Rome, Italy
    • San Diego
    • RAL, UK
    • A functioning international Data Grid for high-energy physics
    Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day
  • 12. Next Generation Technology
    • Every fault that occurs in the distributed environment is the responsibility of the data grid
      • Network outage / system crash / operator error
      • Minimize risk through checksums, replicas, synchronization, federation
    • Management of large collections is labor intensive
      • Initiation of recovery operations after remote system failure
    • Need to automate execution of management policies
  • 13. Controlling Remote Operations iRODS - integrated Rule-Oriented Data System Support unique organizational / social management policies for each collection
  • 14. Rule-based Data Management
    • Express assessment criteria through sets of required persistent state information
    • Express management policies as sets of rules controlling the execution of micro-services
    • Express capabilities as sets of micro-services
      • Manage persistent state information resulting from the application of rules controlling execution of remote micro-services
  • 15. Management Virtualization
    • Examples of management policies
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Track required preservation metadata - templates
        • Generation of Archival Information Packages
  • 16. Rule-based Data Management
    • Rules required for standard operations
      • Posix I/O control
      • Standard SRB operations
    • Administrator controlled rules to implement management policies
      • Administrative - adding / deleting users, resources
      • Data ingestion - pre-processing, post-processing
      • Data transport / deletion - parallel I/O streams, disposition
    • User-defined rules, create your own server-side workflow
      • Rule set for a particular collection, particular user group, particular storage system, particular micro-service
  • 17. iRODS Rule
    • Each rule defines
      • Event
      • Condition
      • Action sets (micro-services and rules)
      • Recovery sets
    • Rule types
      • Atomic, applied immediately
      • Deferred, support deferred consistent constraints
      • Periodic, typically used to validate assertions
  • 18. Rule-based Access
    • Associate security policies with each digital entity
      • Redaction, access controls on structures within a file
      • Time-dependent access controls (how long to hold data proprietary)
    • Associate access controls with each rule
      • Restrict ability to modify, apply rules
    • Associate access controls with each micro-service
      • Explicit control of operation execution within a given collection
      • Much finer control than provided by Unix r:w:e
  • 19. Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • 20. Rule-based Federation
    • When registering a digital entity into another data grid, register required management rules along with the digital entity
      • Move management policies with data
    • Expectation that each operation on each digital entity can be controlled across federated data grids
      • Example is end-to-end encryption
  • 21. Evolution of Rule-based Systems
    • Logical name spaces enable dynamic addition of new rules, micro-services, and state information
      • Apply new rules on one collection while applying old rule sets on a legacy collection
      • Can run old and new rule sets in parallel
    • Can build a system that manages its evolution
      • Can create rules that track the evolution of the rule-based system
      • Can create rules that govern migration to new rule sets
  • 22. Assessment Rules
    • Can build a system that monitors its own state information
      • Parse audit trails to verify accesses by authorized persons
      • Parse persistent state information for compliance with management rules
      • Test micro-services for compliance with rules
      • Audit all accesses to a collection
      • Compare system properties to desired outcomes
  • 23. For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu
    • SRB: http://www.sdsc.edu/srb/
    • iRODS: http://irods.sdsc.edu/