"Transcontinental Persistent Archive Prototype" PowerPoint icon
Upcoming SlideShare
Loading in...5
×
 

"Transcontinental Persistent Archive Prototype" PowerPoint icon

on

  • 540 views

 

Statistics

Views

Total Views
540
Views on SlideShare
540
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

"Transcontinental Persistent Archive Prototype" PowerPoint icon Presentation Transcript

  • 1. Transcontinental Persistent Archive Prototype Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder Data Intensive Cyber Environments rwmoore@renci.org, mwan@diceresearch.org, rajasekar@unc.edu, schroede@diceresearch.org http://irods.diceresearch.org http://srb.diceresearch.org Joseph JaJa University of Maryland, College Park Leesa Brieger University of North Carolina, Chapel Hill
  • 2. Preservation Concepts
    • Infrastructure independence
      • Data virtualization
      • Trust virtualization
      • Policy virtualization
    • Persistent objects
      • Structured information
    • Policy-based preservation environments
      • Automation of assessment criteria validation
    • Federation of persistent archives
      • Migration virtualization
    • Theory of digital preservation
      • Preservation environment reference implementation
  • 3. National Archives and Records Administration Transcontinental Persistent Archive Prototype U Md UCSD MCAT MCAT Federation of Seven Independent Data Grids NARA I MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products. U NC MCAT Georgia Tech MCAT NARA II MCAT Rocket Center MCAT
  • 4. Transcontinental Persistent Archive Prototype (TPAP)
    • Use data grid technology to implement two preservation environments:
      • Storage Resource Broker data grid - SRB
      • Integrated Rule-Oriented data grid - iRODS
    • Topics
      • Why the need for a second generation of data grid technology?
      • What new capabilities are enabled?
  • 5. SRB Data Grid
    • Users are authenticated by the data grid
    SRB Server
    • Authorization is done with access controls
    • Authentication/authorization stored in MCAT
    • Centralized information management
    • Shared data collection
    • Data are owned by the data grid
    SRB Server Metadata Catalog DB
  • 6. Advantage of SRB Data Grid
    • Generic infrastructure
      • Used in other data management applications
    • Scalable
      • Manages hundreds of millions of files and petabytes of data
    • Extensible
      • Can add new storage systems dynamically, create new record series, add new archivists
    • Trustworthy
      • Consistent management of preservation metadata
  • 7. Lessons Learned
    • Infrastructure independence
      • Management of properties of the archives independently of the choice of storage technology
      • Support for standard operations for record manipulation across all storage systems
    • Scalable system
      • SRB organizes petabytes of internationally distributed data into shared collections
      • Administration of large collections needs to be automated
  • 8.
    • iRODS "integrated Rule-Oriented Data System" data grid technology
      • Generic software used to implement preservation environments, digital libraries, real-time sensor systems
    • The iRODS system casts preservation policies as rules that control the execution of preservation processes.
      • Rules can also be defined that validate assertions about the preservation environment such as integrity, authenticity, and chain of custody
    TPAP - Rule-Based Data Grids
  • 9. Preservation Environments iRODS - integrated Rule-Oriented Data System
  • 10. Rule Specification
    • Rule - Event : Condition : Action set :
    • Recovery Procedure
      • Event - atomic, deferred, periodic
      • Condition - test on any state information attribute
      • Action set - chained micro-services and rules
      • Recovery procedure - ensure transaction semantics in a distributed world
    • Rule types
      • System level, administrative level, user level
  • 11. Electronic Records Archive
    • Analyzed capabilities list (852 operations)
      • Identified 600 functions that should be automated by the data grid
      • Compose functions from 177 micro-services
      • Derived 212 metadata attributes that are needed
      • Half of the attributes are related to characterizations of structured information
  • 12. RLG/NARA - TRAC Criteria
    • Assessment categories
      • Organizational infrastructure
      • Digital Object Management
      • Technologies, Technical Infrastructure and Security
    • Example criteria
      • B6.4 Repository has documented and implemented access policies (authorization rules, authentication requirements) consistent with deposit agreements for stored objects.
  • 13. Mapped to a Set of 105 Rules
    • List staff who have archivist execution permission on collection
    • List all persons with access permissions on collection
    • Analyze audit trails to verify identity of all persons accessing the data, and compare their roles with desired access controls
    • Generate report listing all persons who accessed or applied archival functions on the collection
    • Compare report with the deposition agreement
  • 14. Lessons Learned
    • Management of structured information
      • No longer sufficient to manipulate bit streams
    • Structured information
      • Send structured information over the network
      • Parse attributes from structured information
      • Format metadata into reports
      • Access structured information resource
  • 15. Theory of Digital Preservation
    • Prove compliance of the data management system with specified assertions
    • Three components
      • Define the purpose for the collection: assessment criteria, management policies, and management procedures
      • Analyze completeness of the system
        • For each criteria, persistent state is generated that can be audited
        • Persistent state attributes are generated by specific procedure versions
        • For each procedure version there are specific management policy versions
        • For each policy, there are evaluation criteria
      • Audit properties of the system
        • Periodic rules validate assessment criteria
  • 16. For More Information
    • Reagan W. Moore
    • University of North Carolina at Chapel Hill
    • [email_address]
    • http://srb.diceresearch.org
    • http://irods.diceresearch.org
    • TPAP research - supported by NARA
    • iRODS - support from NSF for creation of open source generic software