Your SlideShare is downloading. ×
Rule-Based Data Management Systems
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Rule-Based Data Management Systems

580
views

Published on

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
580
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar { moore , schroede , mwan , sekar }@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
  • 2. Topics
    • Managing distributed shared collections
      • Data grids
    • Control of name spaces - SRB
      • Production system
      • Data and trust virtualization
      • Infrastructure independence
    • Control of management policies - iRODS
      • Next generation technology
      • Management virtualization
      • Rules controlling remote operations
      • Constraints on the rules and remote operations
  • 3. Data Management Applications
    • Data grids
      • Share data
    • Digital libraries
      • Publish data
    • Persistent archives
      • Preserve data
    • Real-time sensor streams
      • Data federation
    • Data analysis
      • Automate access to distributed data
  • 4. Concepts
    • Distributed Data Management Concepts
      • Data virtualization
        • Manage the properties of a shared collection independently of the storage systems
      • Trust virtualization
        • Administrative domain independence
      • Federation
        • Managing interactions between data grids
    • Rule-based Data Management
      • Policy virtualization
        • Automating execution of management policies
        • Applying management policies to remote operations
  • 5. Using a Data Grid – in Abstract Data Grid
    • User asks for data from the data grid
    Ask for data Data delivered
    • The data is found and returned
      • Where & how details are hidden
  • 6. Using a Data Grid - Details
    • Data request goes to SRB Server
    • Server looks up information in catalog
    • Catalog tells which SRB server has data
    • 1 st server asks 2 nd for data
    • The data is found and returned
    • User asks for data
    Storage Resource Broker Server Storage Resource Broker Server Metadata Catalog DB
  • 7. Data Virtualization
    • Manage properties of each digital entity independently of the remote storage systems
      • Infrastructure independence
    • Properties of the shared collection
      • Name spaces
      • Persistent state information (location, size,…)
    • Manage standard operations
      • Map from client requests to standard operations
      • Map from standard operations to remote storage system protocol
  • 8. Data Virtualization
    • Storage Repository
    • Storage location
    • User name
    • File name
    • File context (creation date,…)
    • Access controls
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical context (metadata)
    • Access constraints
    Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection
  • 9. Data Virtualization Storage System Storage Protocol Access Interface Standard Access Actions Data Grid Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Standard Micro-services
  • 10. Standard Operations
    • File manipulation
      • Posix I/O calls - open, close, read, write, seek, …
      • Register, replicate, checksum, synchronize
    • Bulk operations
      • Bulk data transport, metadata load
      • Parallel I/O streams
    • Remote procedures
      • Data filtering, subsetting, metadata extraction
      • Remote library execution (HDFv5, DataCutter)
  • 11. BaBar High-Energy Physics
    • Stanford Linear Accelerator
    • IN2P3
    • Lyon, France
    • Rome, Italy
    • San Diego
    • RAL, UK
    • A functioning international Data Grid for high-energy physics
    Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day
  • 12. Next Generation Technology
    • Every fault that occurs in the distributed environment is the responsibility of the data grid
      • Network outage / system crash / operator error
      • Minimize risk through checksums, replicas, synchronization, federation
    • Management of large collections is labor intensive
      • Initiation of recovery operations after remote system failure
    • Need to automate execution of management policies
  • 13. Controlling Remote Operations iRODS - integrated Rule-Oriented Data System Support unique organizational / social management policies for each collection
  • 14. Rule-based Data Management
    • Express assessment criteria through sets of required persistent state information
    • Express management policies as sets of rules controlling the execution of micro-services
    • Express capabilities as sets of micro-services
      • Manage persistent state information resulting from the application of rules controlling execution of remote micro-services
  • 15. Management Virtualization
    • Examples of management policies
      • Integrity
        • Validation of checksums
        • Synchronization of replicas
        • Data distribution
        • Data retention
        • Access controls
      • Authenticity
        • Chain of custody - audit trails
        • Track required preservation metadata - templates
        • Generation of Archival Information Packages
  • 16. Rule-based Data Management
    • Rules required for standard operations
      • Posix I/O control
      • Standard SRB operations
    • Administrator controlled rules to implement management policies
      • Administrative - adding / deleting users, resources
      • Data ingestion - pre-processing, post-processing
      • Data transport / deletion - parallel I/O streams, disposition
    • User-defined rules, create your own server-side workflow
      • Rule set for a particular collection, particular user group, particular storage system, particular micro-service
  • 17. iRODS Rule
    • Each rule defines
      • Event
      • Condition
      • Action sets (micro-services and rules)
      • Recovery sets
    • Rule types
      • Atomic, applied immediately
      • Deferred, support deferred consistent constraints
      • Periodic, typically used to validate assertions
  • 18. Rule-based Access
    • Associate security policies with each digital entity
      • Redaction, access controls on structures within a file
      • Time-dependent access controls (how long to hold data proprietary)
    • Associate access controls with each rule
      • Restrict ability to modify, apply rules
    • Associate access controls with each micro-service
      • Explicit control of operation execution within a given collection
      • Much finer control than provided by Unix r:w:e
  • 19. Federation Between Data Grids
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH)
    • Data Grid
    • Logical resource name space
    • Logical user name space
    • Logical file name space
    • Logical rule name space
    • Logical micro-service name
    • Logical persistent state
    Data Collection A
  • 20. Rule-based Federation
    • When registering a digital entity into another data grid, register required management rules along with the digital entity
      • Move management policies with data
    • Expectation that each operation on each digital entity can be controlled across federated data grids
      • Example is end-to-end encryption
  • 21. Evolution of Rule-based Systems
    • Logical name spaces enable dynamic addition of new rules, micro-services, and state information
      • Apply new rules on one collection while applying old rule sets on a legacy collection
      • Can run old and new rule sets in parallel
    • Can build a system that manages its evolution
      • Can create rules that track the evolution of the rule-based system
      • Can create rules that govern migration to new rule sets
  • 22. Assessment Rules
    • Can build a system that monitors its own state information
      • Parse audit trails to verify accesses by authorized persons
      • Parse persistent state information for compliance with management rules
      • Test micro-services for compliance with rules
      • Audit all accesses to a collection
      • Compare system properties to desired outcomes
  • 23. For More Information
    • Reagan W. Moore
    • San Diego Supercomputer Center
    • [email_address] edu
    • SRB: http://www.sdsc.edu/srb/
    • iRODS: http://irods.sdsc.edu/