Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applying Repository Systems to Audiovisual Preservation

103 views

Published on

Jon W. Dunn and Karen Cariani. "Applying Repository Systems to Audiovisual Preservation." Open Repositories 2017, Brisbane, Australia, June 30, 2017.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Applying Repository Systems to Audiovisual Preservation

  1. 1. Applying Repository Systems to Audiovisual Preservation Jon W. Dunn, Indiana University Libraries Karen Cariani, WGBH Media Library and Archives #OR2017
  2. 2. Who we are: WGBH Media Library and Archives
  3. 3. Who we are: Indiana University • 3 million+ special collections items • Large focus on AV: • Music and other performing arts • Ethnomusicology, anthropology • Public broadcasting stations • Archival film collections • Athletics • Media Digitization and Preservation Initiative: • 300,000 AV items • 25,000 reels of film • 80 campus units + other IU campuses • 27 PB by the year 2020 • 30+ TB per day peak • http://mdpi.iu.edu/
  4. 4. Challenges of AV • Large files • Individually and in aggregate • Multiple related files • Much metadata • Esoteric and ephemeral formats • Physical and digital • Lack of clear standards • Especially for video and film
  5. 5. Storage Strategy: WGBH • Difficult history with commercial DAM and HSM system • Issues of cost, capacity, performance, network issues, vendor lock-in • Using LTFS-formatted LTO-6 tape • HP LTO-6 Ultrium 6250 drives
  6. 6. Storage Strategy: Indiana Universty • Nearline storage in university- supported HSM environment • IBM HPSS software • Enterprise tape (IBM TS1140) • Typically accessed via hsi tool • Mirrored between Bloomington and Indianapolis • Centrally-funded • Very fast research network (10, 20, 40Gbit connections)
  7. 7. Need for a preservation repository • Track preservation master files in local and external storage • Connect metadata • Descriptive, technical, process history, preservation • Ensure fixity • Regular fixity checking, logging • Support retrieval/delivery of master files to authorized users • Future: support file format migration • We are separating concerns of preservation and access
  8. 8. HydraDAM1 • Developed by WGBH with previous support from NEH • Based on Sufia and Fedora 3.x • Focused on user self-deposit • Adapted to add bulk ingest, bulk edit, characterization of files, transcoding of proxies • Limitations: • Assumed full workflow pipeline for ingestion of A/V materials • Processing performance problems
  9. 9. HydraDAM2: Goals • Move to Fedora 4 • Develop Fedora 4 / Hydra content models for AV preservation • Support multiple storage strategies: offline, online, nearline • Integrate with access systems: Avalon, OpenVault • January 2015 – December 2017
  10. 10. Curation Concerns Framework: Based on:
  11. 11. File Ingest: Conceptually Simple
  12. 12. Reality
  13. 13. DELIVERED HARD DRIVE OR FTP PHYDO DELIVERED DATABASE & METADATA FIXITY CHECKS FILE CHARACTERIZATION PROXY FILE CREATION File Ingest (WGBH)
  14. 14. PHYDO Storage (WGBH) LTO 6 TAPE FIXITY CHECKS FILE INGEST FIXITY CHECKS BACKUP LTO 6 TAPE OFF-SITE STORAGE VAULT STORAGE ORIGINAL DELIVERED DRIVE LOCATION DATA
  15. 15. Access (WGBH) PHYDO WGBH ACCESS WEBSITE EDIT METADATA CIRCULATION REQUEST FILES OR DRIVES MLA STAFF LTO 6 TAPE ORIGINAL DELIVERED DRIVE WGBH USER HARD DRIVE
  16. 16. Media Files and Metadata Digital Preservation Repository (Phydo) Access Repository (Avalon) Masters, Mezzanines Transcodes Out-of-Region Storage?? IU Scholarly Data Archive copies at IUB and IUPUI File Ingest and Storage (IU)
  17. 17. Pre-Ingest Steps (IU) • Master file and metadata uploaded by Memnon or IU facility • Manifest contents verified • Files pushed to tape storage • Checksums verified • File characterization / technical metadata extraction • Transcoding of derivatives for Avalon • Files and metadata pushed to Avalon via Switchyard for access • SIP created for ingest into Phydo
  18. 18. Phydo Content Model (IU)
  19. 19. Apache Camel Routes Asynchronous Storage Proxy Rails application with AS UI gem Local Tape Storage Services Large files on Disk Notify Cloud Storage Services Service translation blueprint Service translation blueprint Service translation blueprint Asynchronous aware user interface provides interactions Proxy provides API with common endpoints and responses Translations map from common API to specific storage APIs Should be able to be an API-X sharable service Fedora 4 Asynchronous Storage: Proof of Concept
  20. 20. Fedora 4 RDF resource container node Non-RDF resource node URL redirect Asynchronous Interactions UI Apache Camel Routes Asynchronous Storage Proxy Slow storage service Invoking from asynchronous interactions from Fedora 4 API Redirecting node via external-body MIME type; can be set using Fedora 4 API and also via Hydra Works file behaviors The URL to redirect to would be wherever the Asynchronous Interactions UI is deployed, immediately invoking interactions for a unique identifier (preferably using persistent URLs) Access to redirecting nodes via Fedora 4 API invokes immediate redirect to stored URL
  21. 21. PREMIS Ingest
  22. 22. Where We’re Going • Continue development • Rebuilding on Hyrax • Build out WGBH storage implementation • Additional user functionality • Build out descriptive metadata / PBcore support • Batch ingest • Batch ingest • Feed to/from Avalon Media System • Pilot implementation • Production implementation
  23. 23. Questions? • https://wiki.dlib.indiana.edu/display/HD2/PHYDO • https://github.com/IUBLibTech/phydo • jwd@iu.edu • karen_cariani@wgbh.org

×