0
Rule-Based Distributed Data Management Introduction iRODS 1.0 - Jan 23, 2008 http://irods.sdsc.edu Reagan W. Moore Mike Wa...
Tutorial Topics <ul><li>Data management applications </li></ul><ul><ul><li>Concepts behind distributed data management </l...
Using a Data Grid –  in Abstract Data Grid <ul><li>User asks for data from the data grid </li></ul>Ask for data Data deliv...
Data Grid Capabilities <ul><li>Logical file name space </li></ul><ul><ul><li>Directory hierarchy / soft links </li></ul></...
Using a SRB Data Grid -  Details <ul><li>Data request goes to SRB Server </li></ul>SRB Server <ul><li>Server looks up info...
Trust Virtualization <ul><li>Users are authenticated by the data grid </li></ul>SRB Server <ul><li>Authorization is done w...
Data Grid Capabilities <ul><li>Logical storage names </li></ul><ul><ul><li>Dynamic resource creation </li></ul></ul><ul><u...
Data Virtualization <ul><li>Storage Repository </li></ul><ul><li>Storage location </li></ul><ul><li>User name </li></ul><u...
iRODS Data Grid Capabilities <ul><li>Remote procedures </li></ul><ul><ul><li>Atomic / deferred / periodic </li></ul></ul><...
Data Virtualization Storage System Storage Protocol Access Interface Traditional approach: Client talks directly  to  stor...
Data Virtualization (Digital Library) Storage System Storage Protocol Access Interface Digital Library Client talks to the...
Data Virtualization (iRODS) Storage System Storage Protocol Access Interface Standard Micro-services Data Grid <ul><li>Map...
Data Grid Capabilities <ul><li>Rules </li></ul><ul><ul><li>User / administrative / internal </li></ul></ul><ul><ul><li>Rem...
Data Management iRODS - integrated Rule-Oriented Data System
Data Grids <ul><li>Administration </li></ul><ul><ul><li>User creation </li></ul></ul><ul><ul><li>Resource creation </li></...
Using an iRODS Data Grid -  Details <ul><li>Data request goes to iRODS Server </li></ul>iRODS Server Rule Engine <ul><li>S...
Transcontinental Persistent Archive Prototype <ul><li>Distributed Data Management Concepts </li></ul><ul><ul><li>Data virt...
National Archives and Records Administration Transcontinental Persistent Archive Prototype U Md SDSC MCAT MCAT Federation ...
Federation Between SRB Data Grids <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logica...
Federation Between IRODS Data Grids <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logi...
Data Management Applications (What do they have in common?) <ul><li>Data grids </li></ul><ul><ul><li>Share data  - organiz...
Approach <ul><li>To meet the diverse requirements, the architecture must: </li></ul><ul><ul><li>Be highly modular </li></u...
Generic Infrastructure <ul><li>Data grids manage data distributed across multiple types of storage systems </li></ul><ul><...
Extremely Successful <ul><li>Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections </l...
 
Observations of Production Data Grids <ul><li>Each community implements different management polices </li></ul><ul><ul><li...
Tension between Common and  Unique Components <ul><li>Synergism - common infrastructure </li></ul><ul><ul><li>Distributed ...
Data Grid Evolution <ul><li>Implement essential components needed for synergism </li></ul><ul><ul><li>Storage Resource Bro...
integrated Rule-Oriented Data System Client Interface Admin Interface Current State Rule Invoker Micro Service Modules Met...
iRODS Development <ul><li>NSF - SDCI grant “Adaptive Middleware for Community Shared Collections” </li></ul><ul><ul><li>iR...
Distributed Management System Rule Engine Data Transport Metadata Catalog Execution Control Messaging System Execution Eng...
iRODS Development Status <ul><li>Production release is version 1.0 </li></ul><ul><ul><li>January 24, 2008 </li></ul></ul><...
Planned Development <ul><li>GSI support (1) </li></ul><ul><li>Time-limited sessions via a one-way hash authentication </li...
For More Information <ul><li>Reagan W. Moore </li></ul><ul><li>San Diego Supercomputer Center </li></ul><ul><li>[email_add...
Upcoming SlideShare
Loading in...5
×

Rule-Based Distributed Data Management Introduction

629

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
629
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Rule-Based Distributed Data Management Introduction"

  1. 1. Rule-Based Distributed Data Management Introduction iRODS 1.0 - Jan 23, 2008 http://irods.sdsc.edu Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego Supercomputer Center { moore , mwan , sekar , schroede }@sdsc.edu http://irods.sdsc.edu http://www. sdsc . edu / srb /
  2. 2. Tutorial Topics <ul><li>Data management applications </li></ul><ul><ul><li>Concepts behind distributed data management </li></ul></ul><ul><li>Data grids </li></ul><ul><ul><li>Infrastructure independence </li></ul></ul><ul><li>Rule-based data grids </li></ul><ul><ul><li>Automation of management policies </li></ul></ul><ul><li>Open source software </li></ul><ul><ul><li>iRODS - integrated Rule-Oriented Data System </li></ul></ul><ul><ul><li>Installation and use of iRODS data grid </li></ul></ul>
  3. 3. Using a Data Grid – in Abstract Data Grid <ul><li>User asks for data from the data grid </li></ul>Ask for data Data delivered <ul><li>The data is found and returned </li></ul><ul><ul><li>Where & how details are hidden </li></ul></ul>
  4. 4. Data Grid Capabilities <ul><li>Logical file name space </li></ul><ul><ul><li>Directory hierarchy / soft links </li></ul></ul><ul><ul><li>Versions / backups / replicas </li></ul></ul><ul><ul><li>Aggregation / containers </li></ul></ul><ul><ul><li>Descriptive metadata </li></ul></ul><ul><ul><li>Digital entities </li></ul></ul><ul><li>Authentication and authorization </li></ul><ul><ul><li>GSI, challenge-response, Shibboleth </li></ul></ul><ul><ul><li>ACLs, audit trails </li></ul></ul><ul><ul><li>Checksums, synchronization </li></ul></ul><ul><ul><li>Logical user name space </li></ul></ul><ul><ul><li>Aggregation / groups </li></ul></ul>
  5. 5. Using a SRB Data Grid - Details <ul><li>Data request goes to SRB Server </li></ul>SRB Server <ul><li>Server looks up information in catalog </li></ul><ul><li>Catalog tells which SRB server has data </li></ul><ul><li>1 st server asks 2 nd for data </li></ul><ul><li>The 2nd SRB server supplies the data </li></ul><ul><li>User asks for data </li></ul>SRB Server Metadata Catalog DB
  6. 6. Trust Virtualization <ul><li>Users are authenticated by the data grid </li></ul>SRB Server <ul><li>Authorization is done with access controls </li></ul><ul><li>Authentication/authorization stored in MCAT </li></ul><ul><li>Centralized information management </li></ul><ul><li>Shared data collection </li></ul><ul><li>Data are owned by the data grid </li></ul>SRB Server Metadata Catalog DB
  7. 7. Data Grid Capabilities <ul><li>Logical storage names </li></ul><ul><ul><li>Dynamic resource creation </li></ul></ul><ul><ul><li>Standard operations </li></ul></ul><ul><ul><li>Heterogeneous storage systems </li></ul></ul><ul><ul><li>Trash </li></ul></ul><ul><ul><li>Collective operations / storage groups </li></ul></ul><ul><li>Data transport </li></ul><ul><ul><li>Parallel I/O </li></ul></ul><ul><ul><li>Small file transport </li></ul></ul><ul><ul><li>Message engine </li></ul></ul><ul><ul><li>Containers / tar files / HDF5 </li></ul></ul><ul><ul><li>Aggregation of I/O commands - remote procedures </li></ul></ul>
  8. 8. Data Virtualization <ul><li>Storage Repository </li></ul><ul><li>Storage location </li></ul><ul><li>User name </li></ul><ul><li>File name </li></ul><ul><li>File context (creation date,…) </li></ul><ul><li>Access controls </li></ul><ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical context (metadata) </li></ul><ul><li>Access constraints </li></ul>Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection
  9. 9. iRODS Data Grid Capabilities <ul><li>Remote procedures </li></ul><ul><ul><li>Atomic / deferred / periodic </li></ul></ul><ul><ul><li>Procedure execution / chaining </li></ul></ul><ul><ul><li>Structured information </li></ul></ul><ul><li>Structured information </li></ul><ul><ul><li>Metadata catalog interactions / 205 queries </li></ul></ul><ul><ul><li>Information transmission </li></ul></ul><ul><ul><li>Template parsing </li></ul></ul><ul><ul><li>Memory structures </li></ul></ul><ul><ul><li>Report generation / audit trail parsing </li></ul></ul>
  10. 10. Data Virtualization Storage System Storage Protocol Access Interface Traditional approach: Client talks directly to storage system using Unix I/O: Microsoft Word
  11. 11. Data Virtualization (Digital Library) Storage System Storage Protocol Access Interface Digital Library Client talks to the Digital Library which then interacts with the storage system using Unix I/O
  12. 12. Data Virtualization (iRODS) Storage System Storage Protocol Access Interface Standard Micro-services Data Grid <ul><li>Map from the actions requested by the access method to a standard set of micro-services. </li></ul><ul><li>The standard micro-services use standard operations. </li></ul><ul><li>Separate protocol drivers are written for each storage system. </li></ul>Standard Operations
  13. 13. Data Grid Capabilities <ul><li>Rules </li></ul><ul><ul><li>User / administrative / internal </li></ul></ul><ul><ul><li>Remote web service invocation </li></ul></ul><ul><ul><li>Rule & micro-service creation </li></ul></ul><ul><ul><li>Standards / XAM, SNIA </li></ul></ul><ul><li>Installation </li></ul><ul><ul><li>CVS / modules </li></ul></ul><ul><ul><li>System dependencies </li></ul></ul><ul><ul><li>Automation </li></ul></ul>
  14. 14. Data Management iRODS - integrated Rule-Oriented Data System
  15. 15. Data Grids <ul><li>Administration </li></ul><ul><ul><li>User creation </li></ul></ul><ul><ul><li>Resource creation </li></ul></ul><ul><ul><li>Token management </li></ul></ul><ul><ul><li>Listing </li></ul></ul><ul><li>Collaborations </li></ul><ul><ul><li>Development plans </li></ul></ul><ul><ul><li>International collaborators </li></ul></ul><ul><ul><li>Federations </li></ul></ul>
  16. 16. Using an iRODS Data Grid - Details <ul><li>Data request goes to iRODS Server </li></ul>iRODS Server Rule Engine <ul><li>Server looks up information in catalog </li></ul><ul><li>Catalog tells which iRODS server has data </li></ul><ul><li>1 st server asks 2 nd for data </li></ul><ul><li>The 2nd iRODS server applies rules </li></ul><ul><li>User asks for data </li></ul>iRODS Server Rule Engine Metadata Catalog Rule Base DB
  17. 17. Transcontinental Persistent Archive Prototype <ul><li>Distributed Data Management Concepts </li></ul><ul><ul><li>Data virtualization </li></ul></ul><ul><ul><ul><li>Storage system independence </li></ul></ul></ul><ul><ul><li>Trust virtualization </li></ul></ul><ul><ul><ul><li>Administration independence </li></ul></ul></ul><ul><li>Risk mitigation </li></ul><ul><ul><li>Federation of multiple independent data grids </li></ul></ul><ul><ul><ul><li>Operation independence </li></ul></ul></ul>
  18. 18. National Archives and Records Administration Transcontinental Persistent Archive Prototype U Md SDSC MCAT MCAT Federation of Seven Independent Data Grids NARA I MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products. U NC MCAT Georgia Tech MCAT NARA II MCAT Rocket Center MCAT
  19. 19. Federation Between SRB Data Grids <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>System attributes </li></ul>Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>System attributes </li></ul>Data Collection A
  20. 20. Federation Between IRODS Data Grids <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical rule name space </li></ul><ul><li>Logical micro-service name </li></ul><ul><li>Logical persistent state </li></ul>Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) <ul><li>Data Grid </li></ul><ul><li>Logical resource name space </li></ul><ul><li>Logical user name space </li></ul><ul><li>Logical file name space </li></ul><ul><li>Logical rule name space </li></ul><ul><li>Logical micro-service name </li></ul><ul><li>Logical persistent state </li></ul>Data Collection A
  21. 21. Data Management Applications (What do they have in common?) <ul><li>Data grids </li></ul><ul><ul><li>Share data - organize distributed data as a collection </li></ul></ul><ul><li>Digital libraries </li></ul><ul><ul><li>Publish data - support browsing and discovery </li></ul></ul><ul><li>Persistent archives </li></ul><ul><ul><li>Preserve data - manage technology evolution </li></ul></ul><ul><li>Real-time sensor systems </li></ul><ul><ul><li>Federate sensor data - integrate across sensor streams </li></ul></ul><ul><li>Workflow systems </li></ul><ul><ul><li>Analyze data - integrate client- & server-side workflows </li></ul></ul>
  22. 22. Approach <ul><li>To meet the diverse requirements, the architecture must: </li></ul><ul><ul><li>Be highly modular </li></ul></ul><ul><ul><li>Be highly extensible </li></ul></ul><ul><ul><li>Provide infrastructure independence </li></ul></ul><ul><ul><li>Enforce management policies </li></ul></ul><ul><ul><li>Provide scalability mechanisms </li></ul></ul><ul><ul><li>Manipulate structured information </li></ul></ul><ul><ul><li>Enable community standards </li></ul></ul>
  23. 23. Generic Infrastructure <ul><li>Data grids manage data distributed across multiple types of storage systems </li></ul><ul><ul><li>File systems, tape archives, object ring buffers </li></ul></ul><ul><li>Data grids manage collection attributes </li></ul><ul><ul><li>Provenance, descriptive, system metadata </li></ul></ul><ul><li>Data grids manage technology evolution </li></ul><ul><ul><li>At the point in time when new technology is available, both the old and new systems can be integrated </li></ul></ul>
  24. 24. Extremely Successful <ul><li>Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections </li></ul><ul><li>Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS: APAC, UK e-Science, IN2P3, WUNgrid </li></ul><ul><ul><li>Astronomy Data grid </li></ul></ul><ul><ul><li>Bio-informatics Digital library </li></ul></ul><ul><ul><li>Earth Sciences Data grid </li></ul></ul><ul><ul><li>Ecology Collection </li></ul></ul><ul><ul><li>Education Persistent archive </li></ul></ul><ul><ul><li>Engineering Digital library </li></ul></ul><ul><ul><li>Environmental science Data grid </li></ul></ul><ul><ul><li>High energy physics Data grid </li></ul></ul><ul><ul><li>Humanities Data Grid </li></ul></ul><ul><ul><li>Medical community Digital library </li></ul></ul><ul><ul><li>Oceanography Real time sensor data, persistent archive </li></ul></ul><ul><ul><li>Seismology Digital library, real-time sensor data </li></ul></ul><ul><li>Goal has been generic infrastructure for distributed data </li></ul>
  25. 26. Observations of Production Data Grids <ul><li>Each community implements different management polices </li></ul><ul><ul><li>Community specific preservation objectives </li></ul></ul><ul><ul><li>Community specific assertions about properties of the shared collection </li></ul></ul><ul><ul><li>Community specific management policies </li></ul></ul><ul><li>Need a mechanism to support the socialization of shared collections </li></ul><ul><ul><li>Map from assertions made by collection creators to expectations of the users </li></ul></ul>
  26. 27. Tension between Common and Unique Components <ul><li>Synergism - common infrastructure </li></ul><ul><ul><li>Distributed data </li></ul></ul><ul><ul><ul><li>Sources, users, performance, reliability, analysis </li></ul></ul></ul><ul><ul><li>Technology management </li></ul></ul><ul><ul><ul><li>Incorporate new technology </li></ul></ul></ul><ul><li>Unique components - extensibility </li></ul><ul><ul><li>Information management </li></ul></ul><ul><ul><ul><li>Semantics, formats, services </li></ul></ul></ul><ul><ul><li>Management policies </li></ul></ul><ul><ul><ul><li>Integrity, authenticity, availability, authorization </li></ul></ul></ul>
  27. 28. Data Grid Evolution <ul><li>Implement essential components needed for synergism </li></ul><ul><ul><li>Storage Resource Broker - SRB </li></ul></ul><ul><ul><li>Infrastructure independence </li></ul></ul><ul><ul><li>Data and trust virtualization </li></ul></ul><ul><li>Implement components needed for specific management policies and processes </li></ul><ul><ul><li>integrated Rule Oriented Data System - iRODS </li></ul></ul><ul><ul><li>Policy management virtualization </li></ul></ul><ul><ul><li>Map processes to standard micro-services </li></ul></ul><ul><ul><li>Structured information management and transmission </li></ul></ul>
  28. 29. integrated Rule-Oriented Data System Client Interface Admin Interface Current State Rule Invoker Micro Service Modules Metadata-based Services Resources Micro Service Modules Resource-based Services Rule Modifier Module Consistency Check Module Confs Config Modifier Module Metadata Modifier Module Metadata Persistent Repository Consistency Check Module Rule Base Service Manager Consistency Check Module Engine Rule
  29. 30. iRODS Development <ul><li>NSF - SDCI grant “Adaptive Middleware for Community Shared Collections” </li></ul><ul><ul><li>iRODS development, SRB maintenance </li></ul></ul><ul><li>NARA - Transcontinental Persistent Archive Prototype </li></ul><ul><ul><li>Trusted repository assessment criteria </li></ul></ul><ul><li>NSF - Ocean Research Interactive Observatory Network (ORION) </li></ul><ul><ul><li>Real-time sensor data stream management </li></ul></ul><ul><li>NSF - Temporal Dynamics of Learning Center data grid </li></ul><ul><ul><li>Management of IRB approval </li></ul></ul>
  30. 31. Distributed Management System Rule Engine Data Transport Metadata Catalog Execution Control Messaging System Execution Engine Virtualization Server Side Workflow Persistent State information Scheduling Policy Management
  31. 32. iRODS Development Status <ul><li>Production release is version 1.0 </li></ul><ul><ul><li>January 24, 2008 </li></ul></ul><ul><li>International collaborations </li></ul><ul><ul><li>SHAMAN - University of Liverpool </li></ul></ul><ul><ul><ul><li>Sustaining Heritage Access through Multivalent ArchiviNg </li></ul></ul></ul><ul><ul><li>UK e-Science data grid </li></ul></ul><ul><ul><li>IN2P3 in Lyon, France </li></ul></ul><ul><ul><li>DSpace policy management </li></ul></ul><ul><ul><li>Shibboleth - collaboration with ASPIS </li></ul></ul>
  32. 33. Planned Development <ul><li>GSI support (1) </li></ul><ul><li>Time-limited sessions via a one-way hash authentication </li></ul><ul><li>Python Client library </li></ul><ul><li>GUI Browser (AJAX in development) </li></ul><ul><li>Driver for HPSS (in development) </li></ul><ul><li>Driver for SAM-QFS </li></ul><ul><li>Porting to additional versions of Unix/Linux </li></ul><ul><li>Porting to Windows </li></ul><ul><li>Support for MySQL as the metadata catalog </li></ul><ul><li>API support packages based on existing mounted collection driver </li></ul><ul><li>MCAT to ICAT migration tools (2) </li></ul><ul><li>Extensible Metadata including Databases Access Interface (6) </li></ul><ul><li>Zones/Federation (4) </li></ul><ul><li>Auditing - mechanisms to record and track iRODS metadata changes </li></ul>
  33. 34. For More Information <ul><li>Reagan W. Moore </li></ul><ul><li>San Diego Supercomputer Center </li></ul><ul><li>[email_address] edu </li></ul><ul><li>http://www.sdsc.edu/srb/ </li></ul><ul><li>http://irods.sdsc.edu/ </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×