Your SlideShare is downloading. ×
"Transcontinental Persistent Archive Prototype" PowerPoint icon
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

"Transcontinental Persistent Archive Prototype" PowerPoint icon


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Transcontinental Persistent Archive Prototype Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder Data Intensive Cyber Environments,,, Joseph JaJa University of Maryland, College Park Leesa Brieger University of North Carolina, Chapel Hill
  • 2. Preservation Concepts
    • Infrastructure independence
      • Data virtualization
      • Trust virtualization
      • Policy virtualization
    • Persistent objects
      • Structured information
    • Policy-based preservation environments
      • Automation of assessment criteria validation
    • Federation of persistent archives
      • Migration virtualization
    • Theory of digital preservation
      • Preservation environment reference implementation
  • 3. National Archives and Records Administration Transcontinental Persistent Archive Prototype U Md UCSD MCAT MCAT Federation of Seven Independent Data Grids NARA I MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products. U NC MCAT Georgia Tech MCAT NARA II MCAT Rocket Center MCAT
  • 4. Transcontinental Persistent Archive Prototype (TPAP)
    • Use data grid technology to implement two preservation environments:
      • Storage Resource Broker data grid - SRB
      • Integrated Rule-Oriented data grid - iRODS
    • Topics
      • Why the need for a second generation of data grid technology?
      • What new capabilities are enabled?
  • 5. SRB Data Grid
    • Users are authenticated by the data grid
    SRB Server
    • Authorization is done with access controls
    • Authentication/authorization stored in MCAT
    • Centralized information management
    • Shared data collection
    • Data are owned by the data grid
    SRB Server Metadata Catalog DB
  • 6. Advantage of SRB Data Grid
    • Generic infrastructure
      • Used in other data management applications
    • Scalable
      • Manages hundreds of millions of files and petabytes of data
    • Extensible
      • Can add new storage systems dynamically, create new record series, add new archivists
    • Trustworthy
      • Consistent management of preservation metadata
  • 7. Lessons Learned
    • Infrastructure independence
      • Management of properties of the archives independently of the choice of storage technology
      • Support for standard operations for record manipulation across all storage systems
    • Scalable system
      • SRB organizes petabytes of internationally distributed data into shared collections
      • Administration of large collections needs to be automated
  • 8.
    • iRODS "integrated Rule-Oriented Data System" data grid technology
      • Generic software used to implement preservation environments, digital libraries, real-time sensor systems
    • The iRODS system casts preservation policies as rules that control the execution of preservation processes.
      • Rules can also be defined that validate assertions about the preservation environment such as integrity, authenticity, and chain of custody
    TPAP - Rule-Based Data Grids
  • 9. Preservation Environments iRODS - integrated Rule-Oriented Data System
  • 10. Rule Specification
    • Rule - Event : Condition : Action set :
    • Recovery Procedure
      • Event - atomic, deferred, periodic
      • Condition - test on any state information attribute
      • Action set - chained micro-services and rules
      • Recovery procedure - ensure transaction semantics in a distributed world
    • Rule types
      • System level, administrative level, user level
  • 11. Electronic Records Archive
    • Analyzed capabilities list (852 operations)
      • Identified 600 functions that should be automated by the data grid
      • Compose functions from 177 micro-services
      • Derived 212 metadata attributes that are needed
      • Half of the attributes are related to characterizations of structured information
  • 12. RLG/NARA - TRAC Criteria
    • Assessment categories
      • Organizational infrastructure
      • Digital Object Management
      • Technologies, Technical Infrastructure and Security
    • Example criteria
      • B6.4 Repository has documented and implemented access policies (authorization rules, authentication requirements) consistent with deposit agreements for stored objects.
  • 13. Mapped to a Set of 105 Rules
    • List staff who have archivist execution permission on collection
    • List all persons with access permissions on collection
    • Analyze audit trails to verify identity of all persons accessing the data, and compare their roles with desired access controls
    • Generate report listing all persons who accessed or applied archival functions on the collection
    • Compare report with the deposition agreement
  • 14. Lessons Learned
    • Management of structured information
      • No longer sufficient to manipulate bit streams
    • Structured information
      • Send structured information over the network
      • Parse attributes from structured information
      • Format metadata into reports
      • Access structured information resource
  • 15. Theory of Digital Preservation
    • Prove compliance of the data management system with specified assertions
    • Three components
      • Define the purpose for the collection: assessment criteria, management policies, and management procedures
      • Analyze completeness of the system
        • For each criteria, persistent state is generated that can be audited
        • Persistent state attributes are generated by specific procedure versions
        • For each procedure version there are specific management policy versions
        • For each policy, there are evaluation criteria
      • Audit properties of the system
        • Periodic rules validate assessment criteria
  • 16. For More Information
    • Reagan W. Moore
    • University of North Carolina at Chapel Hill
    • [email_address]
    • TPAP research - supported by NARA
    • iRODS - support from NSF for creation of open source generic software