"Transcontinental Persistent Archive Prototype" PowerPoint icon
Upcoming SlideShare
Loading in...5

"Transcontinental Persistent Archive Prototype" PowerPoint icon






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

"Transcontinental Persistent Archive Prototype" PowerPoint icon Presentation Transcript

  • 1. Transcontinental Persistent Archive Prototype Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder Data Intensive Cyber Environments rwmoore@renci.org, mwan@diceresearch.org, rajasekar@unc.edu, schroede@diceresearch.org http://irods.diceresearch.org http://srb.diceresearch.org Joseph JaJa University of Maryland, College Park Leesa Brieger University of North Carolina, Chapel Hill
  • 2. Preservation Concepts
    • Infrastructure independence
      • Data virtualization
      • Trust virtualization
      • Policy virtualization
    • Persistent objects
      • Structured information
    • Policy-based preservation environments
      • Automation of assessment criteria validation
    • Federation of persistent archives
      • Migration virtualization
    • Theory of digital preservation
      • Preservation environment reference implementation
  • 3. National Archives and Records Administration Transcontinental Persistent Archive Prototype U Md UCSD MCAT MCAT Federation of Seven Independent Data Grids NARA I MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products. U NC MCAT Georgia Tech MCAT NARA II MCAT Rocket Center MCAT
  • 4. Transcontinental Persistent Archive Prototype (TPAP)
    • Use data grid technology to implement two preservation environments:
      • Storage Resource Broker data grid - SRB
      • Integrated Rule-Oriented data grid - iRODS
    • Topics
      • Why the need for a second generation of data grid technology?
      • What new capabilities are enabled?
  • 5. SRB Data Grid
    • Users are authenticated by the data grid
    SRB Server
    • Authorization is done with access controls
    • Authentication/authorization stored in MCAT
    • Centralized information management
    • Shared data collection
    • Data are owned by the data grid
    SRB Server Metadata Catalog DB
  • 6. Advantage of SRB Data Grid
    • Generic infrastructure
      • Used in other data management applications
    • Scalable
      • Manages hundreds of millions of files and petabytes of data
    • Extensible
      • Can add new storage systems dynamically, create new record series, add new archivists
    • Trustworthy
      • Consistent management of preservation metadata
  • 7. Lessons Learned
    • Infrastructure independence
      • Management of properties of the archives independently of the choice of storage technology
      • Support for standard operations for record manipulation across all storage systems
    • Scalable system
      • SRB organizes petabytes of internationally distributed data into shared collections
      • Administration of large collections needs to be automated
  • 8.
    • iRODS "integrated Rule-Oriented Data System" data grid technology
      • Generic software used to implement preservation environments, digital libraries, real-time sensor systems
    • The iRODS system casts preservation policies as rules that control the execution of preservation processes.
      • Rules can also be defined that validate assertions about the preservation environment such as integrity, authenticity, and chain of custody
    TPAP - Rule-Based Data Grids
  • 9. Preservation Environments iRODS - integrated Rule-Oriented Data System
  • 10. Rule Specification
    • Rule - Event : Condition : Action set :
    • Recovery Procedure
      • Event - atomic, deferred, periodic
      • Condition - test on any state information attribute
      • Action set - chained micro-services and rules
      • Recovery procedure - ensure transaction semantics in a distributed world
    • Rule types
      • System level, administrative level, user level
  • 11. Electronic Records Archive
    • Analyzed capabilities list (852 operations)
      • Identified 600 functions that should be automated by the data grid
      • Compose functions from 177 micro-services
      • Derived 212 metadata attributes that are needed
      • Half of the attributes are related to characterizations of structured information
  • 12. RLG/NARA - TRAC Criteria
    • Assessment categories
      • Organizational infrastructure
      • Digital Object Management
      • Technologies, Technical Infrastructure and Security
    • Example criteria
      • B6.4 Repository has documented and implemented access policies (authorization rules, authentication requirements) consistent with deposit agreements for stored objects.
  • 13. Mapped to a Set of 105 Rules
    • List staff who have archivist execution permission on collection
    • List all persons with access permissions on collection
    • Analyze audit trails to verify identity of all persons accessing the data, and compare their roles with desired access controls
    • Generate report listing all persons who accessed or applied archival functions on the collection
    • Compare report with the deposition agreement
  • 14. Lessons Learned
    • Management of structured information
      • No longer sufficient to manipulate bit streams
    • Structured information
      • Send structured information over the network
      • Parse attributes from structured information
      • Format metadata into reports
      • Access structured information resource
  • 15. Theory of Digital Preservation
    • Prove compliance of the data management system with specified assertions
    • Three components
      • Define the purpose for the collection: assessment criteria, management policies, and management procedures
      • Analyze completeness of the system
        • For each criteria, persistent state is generated that can be audited
        • Persistent state attributes are generated by specific procedure versions
        • For each procedure version there are specific management policy versions
        • For each policy, there are evaluation criteria
      • Audit properties of the system
        • Periodic rules validate assessment criteria
  • 16. For More Information
    • Reagan W. Moore
    • University of North Carolina at Chapel Hill
    • [email_address]
    • http://srb.diceresearch.org
    • http://irods.diceresearch.org
    • TPAP research - supported by NARA
    • iRODS - support from NSF for creation of open source generic software