SlideShare a Scribd company logo
Applying Repository Systems to
Audiovisual Preservation
Jon W. Dunn, Indiana University Libraries
Karen Cariani, WGBH Media Library and Archives
#OR2017
Who we are: WGBH Media Library and Archives
Who we are: Indiana University
• 3 million+ special collections items
• Large focus on AV:
• Music and other performing arts
• Ethnomusicology, anthropology
• Public broadcasting stations
• Archival film collections
• Athletics
• Media Digitization and Preservation
Initiative:
• 300,000 AV items
• 25,000 reels of film
• 80 campus units + other IU campuses
• 27 PB by the year 2020
• 30+ TB per day peak
• http://mdpi.iu.edu/
Challenges of AV
• Large files
• Individually and in aggregate
• Multiple related files
• Much metadata
• Esoteric and ephemeral formats
• Physical and digital
• Lack of clear standards
• Especially for video and film
Storage Strategy: WGBH
• Difficult history with commercial
DAM and HSM system
• Issues of cost, capacity,
performance, network issues,
vendor lock-in
• Using LTFS-formatted LTO-6 tape
• HP LTO-6 Ultrium 6250 drives
Storage Strategy: Indiana Universty
• Nearline storage in university-
supported HSM environment
• IBM HPSS software
• Enterprise tape (IBM TS1140)
• Typically accessed via hsi tool
• Mirrored between Bloomington
and Indianapolis
• Centrally-funded
• Very fast research network
(10, 20, 40Gbit connections)
Need for a preservation repository
• Track preservation master files in local and external storage
• Connect metadata
• Descriptive, technical, process history, preservation
• Ensure fixity
• Regular fixity checking, logging
• Support retrieval/delivery of master files to authorized users
• Future: support file format migration
• We are separating concerns of preservation and access
HydraDAM1
• Developed by WGBH with previous support from NEH
• Based on Sufia and Fedora 3.x
• Focused on user self-deposit
• Adapted to add bulk ingest, bulk edit, characterization of files,
transcoding of proxies
• Limitations:
• Assumed full workflow pipeline for ingestion of A/V materials
• Processing performance problems
HydraDAM2: Goals
• Move to Fedora 4
• Develop Fedora 4 / Hydra content models for AV preservation
• Support multiple storage strategies: offline, online, nearline
• Integrate with access systems: Avalon, OpenVault
• January 2015 – December 2017
Curation
Concerns
Framework:
Based on:
File Ingest: Conceptually Simple
Reality
DELIVERED HARD DRIVE
OR
FTP
PHYDO
DELIVERED
DATABASE
&
METADATA
FIXITY
CHECKS
FILE CHARACTERIZATION
PROXY FILE
CREATION
File Ingest (WGBH)
PHYDO
Storage (WGBH)
LTO 6
TAPE
FIXITY
CHECKS
FILE
INGEST
FIXITY CHECKS
BACKUP
LTO 6
TAPE
OFF-SITE
STORAGE
VAULT
STORAGE
ORIGINAL
DELIVERED
DRIVE
LOCATION
DATA
Access (WGBH)
PHYDO
WGBH
ACCESS
WEBSITE
EDIT
METADATA
CIRCULATION
REQUEST FILES OR
DRIVES MLA STAFF
LTO 6
TAPE
ORIGINAL
DELIVERED
DRIVE
WGBH
USER
HARD
DRIVE
Media Files
and Metadata
Digital
Preservation
Repository
(Phydo)
Access Repository
(Avalon)
Masters,
Mezzanines
Transcodes
Out-of-Region
Storage??
IU Scholarly
Data Archive
copies at
IUB and IUPUI
File Ingest and Storage (IU)
Pre-Ingest Steps (IU)
• Master file and metadata uploaded by Memnon or IU facility
• Manifest contents verified
• Files pushed to tape storage
• Checksums verified
• File characterization / technical metadata extraction
• Transcoding of derivatives for Avalon
• Files and metadata pushed to Avalon via Switchyard for access
• SIP created for ingest into Phydo
Phydo Content Model (IU)
Apache Camel Routes
Asynchronous Storage Proxy
Rails application with AS UI gem
Local Tape
Storage
Services Large files
on Disk
Notify
Cloud
Storage
Services
Service
translation
blueprint
Service
translation
blueprint
Service
translation
blueprint
Asynchronous aware
user interface provides
interactions
Proxy provides API
with common
endpoints and
responses
Translations map
from common
API to specific
storage APIs
Should be able to
be an API-X
sharable service
Fedora 4 Asynchronous Storage: Proof of Concept
Fedora 4
RDF resource container
node
Non-RDF resource node
URL redirect
Asynchronous Interactions UI
Apache Camel Routes
Asynchronous Storage Proxy
Slow storage
service
Invoking from asynchronous interactions from Fedora 4 API
Redirecting node via
external-body MIME type;
can be set using Fedora 4
API and also via Hydra
Works file behaviors
The URL to redirect to would be
wherever the Asynchronous
Interactions UI is deployed,
immediately invoking interactions for a
unique identifier (preferably using
persistent URLs)
Access to redirecting nodes
via Fedora 4 API invokes
immediate redirect to stored
URL
PREMIS Ingest
Where We’re Going
• Continue development
• Rebuilding on Hyrax
• Build out WGBH storage implementation
• Additional user functionality
• Build out descriptive metadata / PBcore support
• Batch ingest
• Batch ingest
• Feed to/from Avalon Media System
• Pilot implementation
• Production implementation
Questions?
• https://wiki.dlib.indiana.edu/display/HD2/PHYDO
• https://github.com/IUBLibTech/phydo
• jwd@iu.edu
• karen_cariani@wgbh.org

More Related Content

What's hot

The Avalon Media System: Open Source Audio and Video Access for Libraries and...
The Avalon Media System: Open Source Audio and Video Access for Libraries and...The Avalon Media System: Open Source Audio and Video Access for Libraries and...
The Avalon Media System: Open Source Audio and Video Access for Libraries and...Avalon Media System
 
Avalon Media System: Implementation and Community
Avalon Media System: Implementation and CommunityAvalon Media System: Implementation and Community
Avalon Media System: Implementation and CommunityAvalon Media System
 
Avalon Variations webinar dec 2015
Avalon Variations webinar dec 2015Avalon Variations webinar dec 2015
Avalon Variations webinar dec 2015Avalon Media System
 
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...Avalon Media System
 
DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014Repository Fringe
 
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...Repository Fringe
 
VRA 2012 - MDID Users Group
VRA 2012 - MDID Users GroupVRA 2012 - MDID Users Group
VRA 2012 - MDID Users Groupknabar
 
Thinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowThinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowTerry Reese
 
An Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsAn Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsArtefactual Systems - AtoM
 
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...Artefactual Systems - Archivematica
 

What's hot (20)

The Avalon Media System: Open Source Audio and Video Access for Libraries and...
The Avalon Media System: Open Source Audio and Video Access for Libraries and...The Avalon Media System: Open Source Audio and Video Access for Libraries and...
The Avalon Media System: Open Source Audio and Video Access for Libraries and...
 
Resource space
Resource spaceResource space
Resource space
 
Avalon Media System: Implementation and Community
Avalon Media System: Implementation and CommunityAvalon Media System: Implementation and Community
Avalon Media System: Implementation and Community
 
Avalon Variations webinar dec 2015
Avalon Variations webinar dec 2015Avalon Variations webinar dec 2015
Avalon Variations webinar dec 2015
 
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
 
Avalon 2016 Overview
Avalon 2016 OverviewAvalon 2016 Overview
Avalon 2016 Overview
 
DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014DSpace Update from Open Repositories 2014
DSpace Update from Open Repositories 2014
 
Archivematica Community Update - SAA 2016
Archivematica Community Update - SAA 2016Archivematica Community Update - SAA 2016
Archivematica Community Update - SAA 2016
 
Digital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An IntroductionDigital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An Introduction
 
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
Collections.ed – Launching the University Collections Online, Ianthe Sutherla...
 
2015 05-07-mac
2015 05-07-mac2015 05-07-mac
2015 05-07-mac
 
VRA 2012 - MDID Users Group
VRA 2012 - MDID Users GroupVRA 2012 - MDID Users Group
VRA 2012 - MDID Users Group
 
VRA 2012, MDID users group
VRA 2012, MDID users groupVRA 2012, MDID users group
VRA 2012, MDID users group
 
Thinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage WorkflowThinking about Preservation: OSUL Content Manage Workflow
Thinking about Preservation: OSUL Content Manage Workflow
 
An Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsAn Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual Systems
 
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
Avoiding the 927 Problem: Standards, Digital Preservation, and Communities of...
 
VanDyck Long-Term Preservation of Digital Scholarly Literature
VanDyck Long-Term Preservation of Digital Scholarly LiteratureVanDyck Long-Term Preservation of Digital Scholarly Literature
VanDyck Long-Term Preservation of Digital Scholarly Literature
 
Workshop slides - Introduction to AtoM and Archivematica
Workshop slides - Introduction to AtoM and ArchivematicaWorkshop slides - Introduction to AtoM and Archivematica
Workshop slides - Introduction to AtoM and Archivematica
 
HIMALDOC: A one-stop portal for Himalayan information resources
HIMALDOC: A one-stop portal for Himalayan information resourcesHIMALDOC: A one-stop portal for Himalayan information resources
HIMALDOC: A one-stop portal for Himalayan information resources
 
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...
NISO Two-Part Webinar: Sustainable Information Part 2: Digital Preservation o...
 

Similar to Applying Repository Systems to Audiovisual Preservation

Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...DuraSpace
 
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...mharpasu
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaJisc RDM
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 
Open Repositories 2015: Avalon Media System: Community Implementation and Sus...
Open Repositories 2015: Avalon Media System: Community Implementation and Sus...Open Repositories 2015: Avalon Media System: Community Implementation and Sus...
Open Repositories 2015: Avalon Media System: Community Implementation and Sus...Avalon Media System
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDavid Wallom
 
Webinar: What Your Object Storage Vendor Isn’t Telling You About NFS Support
Webinar: What Your Object Storage Vendor Isn’t Telling You About NFS SupportWebinar: What Your Object Storage Vendor Isn’t Telling You About NFS Support
Webinar: What Your Object Storage Vendor Isn’t Telling You About NFS SupportStorage Switzerland
 
Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...Jenn Riley
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDocker, Inc.
 
Archivematica and Local Authority Archive Services
Archivematica and Local Authority Archive ServicesArchivematica and Local Authority Archive Services
Archivematica and Local Authority Archive ServicesPaweł Jaskulski
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Roxanne Missingham
 
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”DuraSpace
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System
 

Similar to Applying Repository Systems to Audiovisual Preservation (20)

Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
 
Avalon Media System update
Avalon Media System updateAvalon Media System update
Avalon Media System update
 
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
 
From Box to Hydra via Archivematica
From Box to Hydra via ArchivematicaFrom Box to Hydra via Archivematica
From Box to Hydra via Archivematica
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Open Repositories 2015: Avalon Media System: Community Implementation and Sus...
Open Repositories 2015: Avalon Media System: Community Implementation and Sus...Open Repositories 2015: Avalon Media System: Community Implementation and Sus...
Open Repositories 2015: Avalon Media System: Community Implementation and Sus...
 
Desktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omicsDesktop as a Service supporting Environmental ‘omics
Desktop as a Service supporting Environmental ‘omics
 
Webinar: What Your Object Storage Vendor Isn’t Telling You About NFS Support
Webinar: What Your Object Storage Vendor Isn’t Telling You About NFS SupportWebinar: What Your Object Storage Vendor Isn’t Telling You About NFS Support
Webinar: What Your Object Storage Vendor Isn’t Telling You About NFS Support
 
Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...Building an Audio Preservation System at Indiana University Using Standards a...
Building an Audio Preservation System at Indiana University Using Standards a...
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker Containers
 
CNI 2016: Avalon overview
CNI 2016: Avalon overviewCNI 2016: Avalon overview
CNI 2016: Avalon overview
 
Archivematica and Local Authority Archive Services
Archivematica and Local Authority Archive ServicesArchivematica and Local Authority Archive Services
Archivematica and Local Authority Archive Services
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
 
HDF
HDFHDF
HDF
 
Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)Avalon Media System (Open Repositories 2014 poster)
Avalon Media System (Open Repositories 2014 poster)
 
Hota hadoop
Hota hadoopHota hadoop
Hota hadoop
 

More from Jon W. Dunn

AMP: An Audiovisual Metadata Platform to Support Mass Description
AMP: An Audiovisual Metadata Platform to Support Mass DescriptionAMP: An Audiovisual Metadata Platform to Support Mass Description
AMP: An Audiovisual Metadata Platform to Support Mass DescriptionJon W. Dunn
 
An Audiovisual Metadata Platform to Support Mass Description
An Audiovisual Metadata Platform to Support Mass DescriptionAn Audiovisual Metadata Platform to Support Mass Description
An Audiovisual Metadata Platform to Support Mass DescriptionJon W. Dunn
 
Variations on Video: The Avalon Media System
Variations on Video: The Avalon Media SystemVariations on Video: The Avalon Media System
Variations on Video: The Avalon Media SystemJon W. Dunn
 
Sakai11 Citations BOF Introductory Slides
Sakai11 Citations BOF Introductory SlidesSakai11 Citations BOF Introductory Slides
Sakai11 Citations BOF Introductory SlidesJon W. Dunn
 
User Needs and Project Plans for Library-Managed Media Assets
User Needs and Project Plans for Library-Managed Media AssetsUser Needs and Project Plans for Library-Managed Media Assets
User Needs and Project Plans for Library-Managed Media AssetsJon W. Dunn
 
Integration of Library Resources and Services in Sakai 3
Integration of Library Resources and Services in Sakai 3Integration of Library Resources and Services in Sakai 3
Integration of Library Resources and Services in Sakai 3Jon W. Dunn
 

More from Jon W. Dunn (6)

AMP: An Audiovisual Metadata Platform to Support Mass Description
AMP: An Audiovisual Metadata Platform to Support Mass DescriptionAMP: An Audiovisual Metadata Platform to Support Mass Description
AMP: An Audiovisual Metadata Platform to Support Mass Description
 
An Audiovisual Metadata Platform to Support Mass Description
An Audiovisual Metadata Platform to Support Mass DescriptionAn Audiovisual Metadata Platform to Support Mass Description
An Audiovisual Metadata Platform to Support Mass Description
 
Variations on Video: The Avalon Media System
Variations on Video: The Avalon Media SystemVariations on Video: The Avalon Media System
Variations on Video: The Avalon Media System
 
Sakai11 Citations BOF Introductory Slides
Sakai11 Citations BOF Introductory SlidesSakai11 Citations BOF Introductory Slides
Sakai11 Citations BOF Introductory Slides
 
User Needs and Project Plans for Library-Managed Media Assets
User Needs and Project Plans for Library-Managed Media AssetsUser Needs and Project Plans for Library-Managed Media Assets
User Needs and Project Plans for Library-Managed Media Assets
 
Integration of Library Resources and Services in Sakai 3
Integration of Library Resources and Services in Sakai 3Integration of Library Resources and Services in Sakai 3
Integration of Library Resources and Services in Sakai 3
 

Recently uploaded

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 

Applying Repository Systems to Audiovisual Preservation

  • 1. Applying Repository Systems to Audiovisual Preservation Jon W. Dunn, Indiana University Libraries Karen Cariani, WGBH Media Library and Archives #OR2017
  • 2. Who we are: WGBH Media Library and Archives
  • 3. Who we are: Indiana University • 3 million+ special collections items • Large focus on AV: • Music and other performing arts • Ethnomusicology, anthropology • Public broadcasting stations • Archival film collections • Athletics • Media Digitization and Preservation Initiative: • 300,000 AV items • 25,000 reels of film • 80 campus units + other IU campuses • 27 PB by the year 2020 • 30+ TB per day peak • http://mdpi.iu.edu/
  • 4. Challenges of AV • Large files • Individually and in aggregate • Multiple related files • Much metadata • Esoteric and ephemeral formats • Physical and digital • Lack of clear standards • Especially for video and film
  • 5. Storage Strategy: WGBH • Difficult history with commercial DAM and HSM system • Issues of cost, capacity, performance, network issues, vendor lock-in • Using LTFS-formatted LTO-6 tape • HP LTO-6 Ultrium 6250 drives
  • 6. Storage Strategy: Indiana Universty • Nearline storage in university- supported HSM environment • IBM HPSS software • Enterprise tape (IBM TS1140) • Typically accessed via hsi tool • Mirrored between Bloomington and Indianapolis • Centrally-funded • Very fast research network (10, 20, 40Gbit connections)
  • 7. Need for a preservation repository • Track preservation master files in local and external storage • Connect metadata • Descriptive, technical, process history, preservation • Ensure fixity • Regular fixity checking, logging • Support retrieval/delivery of master files to authorized users • Future: support file format migration • We are separating concerns of preservation and access
  • 8. HydraDAM1 • Developed by WGBH with previous support from NEH • Based on Sufia and Fedora 3.x • Focused on user self-deposit • Adapted to add bulk ingest, bulk edit, characterization of files, transcoding of proxies • Limitations: • Assumed full workflow pipeline for ingestion of A/V materials • Processing performance problems
  • 9. HydraDAM2: Goals • Move to Fedora 4 • Develop Fedora 4 / Hydra content models for AV preservation • Support multiple storage strategies: offline, online, nearline • Integrate with access systems: Avalon, OpenVault • January 2015 – December 2017
  • 13. DELIVERED HARD DRIVE OR FTP PHYDO DELIVERED DATABASE & METADATA FIXITY CHECKS FILE CHARACTERIZATION PROXY FILE CREATION File Ingest (WGBH)
  • 14. PHYDO Storage (WGBH) LTO 6 TAPE FIXITY CHECKS FILE INGEST FIXITY CHECKS BACKUP LTO 6 TAPE OFF-SITE STORAGE VAULT STORAGE ORIGINAL DELIVERED DRIVE LOCATION DATA
  • 15. Access (WGBH) PHYDO WGBH ACCESS WEBSITE EDIT METADATA CIRCULATION REQUEST FILES OR DRIVES MLA STAFF LTO 6 TAPE ORIGINAL DELIVERED DRIVE WGBH USER HARD DRIVE
  • 16. Media Files and Metadata Digital Preservation Repository (Phydo) Access Repository (Avalon) Masters, Mezzanines Transcodes Out-of-Region Storage?? IU Scholarly Data Archive copies at IUB and IUPUI File Ingest and Storage (IU)
  • 17. Pre-Ingest Steps (IU) • Master file and metadata uploaded by Memnon or IU facility • Manifest contents verified • Files pushed to tape storage • Checksums verified • File characterization / technical metadata extraction • Transcoding of derivatives for Avalon • Files and metadata pushed to Avalon via Switchyard for access • SIP created for ingest into Phydo
  • 19. Apache Camel Routes Asynchronous Storage Proxy Rails application with AS UI gem Local Tape Storage Services Large files on Disk Notify Cloud Storage Services Service translation blueprint Service translation blueprint Service translation blueprint Asynchronous aware user interface provides interactions Proxy provides API with common endpoints and responses Translations map from common API to specific storage APIs Should be able to be an API-X sharable service Fedora 4 Asynchronous Storage: Proof of Concept
  • 20. Fedora 4 RDF resource container node Non-RDF resource node URL redirect Asynchronous Interactions UI Apache Camel Routes Asynchronous Storage Proxy Slow storage service Invoking from asynchronous interactions from Fedora 4 API Redirecting node via external-body MIME type; can be set using Fedora 4 API and also via Hydra Works file behaviors The URL to redirect to would be wherever the Asynchronous Interactions UI is deployed, immediately invoking interactions for a unique identifier (preferably using persistent URLs) Access to redirecting nodes via Fedora 4 API invokes immediate redirect to stored URL
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 31.
  • 32. Where We’re Going • Continue development • Rebuilding on Hyrax • Build out WGBH storage implementation • Additional user functionality • Build out descriptive metadata / PBcore support • Batch ingest • Batch ingest • Feed to/from Avalon Media System • Pilot implementation • Production implementation

Editor's Notes

  1. Who are we? WGBH is Boston’s Public television station. We produce fully one third of the content broadcast on PBS, including the series you see here, as well as Downton Abbey and Sherlock. In addition to television, we have 2 radio stations and a large, award winning Interactive department that is the number one producer for the sites you’ll find on PBS.org. As you can see, we produce a wide variety of programming from public affairs, to history and science, to children’s program, arts, culture, drama and how to’s. We have been on the air since 1951 with radio and 1955 with television. At heart and through our mission we are an educational and cultural institution. We originated out of a consortium of academic universities in the Boston area. Because we have produced so much we have a large archive of educational programming that is of interest to scholars and researchers, in addition to the public.