Casey, Michael, Jon Dunn, and Jenn Riley. “Building an Audio Preservation System at Indiana University Using Standards and Best Practices.” April 14, 2008.
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Building an Audio Preservation System at Indiana University Using Standards and Best Practices
1. Building an Audio Preservation System at
Indiana University Using Standards
and Best Practices
Mike Casey, Archives of Traditional Music
Jon Dunn, Digital Library Program
Jenn Riley, Digital Library Program
April 14, 2008
3. August 2, 2014
Plus many more!
Audio/Video at IUB
• AAAMC
• Music Library
• University Archives
• ATM
• HPER
• Radio/TV
• Center for the Study of History and
Memory
• Kinsey Institute
• Athletics
• Emeriti House
• Office of University Marketing
• Wells Library
• AISRI
• Office of Dean of the Faculties
• Lilly Library
• Alumni Relations
• School of Journalism
• School of Music
• School of Law
• Traditional Arts Indiana
• Department of History
• Department of Anthropology
• Department of
Folklore/Ethnomusicology
• Black Film Center/Archive
4. By the Numbers
• ATM: 110,000 (mostly) audio recordings
• Wells Library, Kent Cooper Room: 20,000 videos
• Music Library: 137,000 audio recordings
*2,000 lacquer discs
*8,000 DATs
*50,000 open reel tapes
• CSHM: 3,200 audio recordings
• AAAMC: 5,900 audio recordings
13. Preservation in the Analog Domain
• Life expectancy critically important
• Predicting when a recording will fail
• Quest for the eternal carrier
• Target preservation format-mastering-quality
open reel tape
• Standards set in the mid-1980’s-ARSC/AAA
14. The New Paradigm
• Eternal sound carriers never available
• Maintaining equipment long-term
unmanageable
Therefore, classical preservation strategy is
hopeless
15. The New Paradigm
• Preserve the content, not the carrier
• The eternal file, not the eternal carrier
• Use digital mass storage systems
Longevity of carriers in mass storage systems of
minor importance
16. Standards and Best Practices
• Ensure Quality
• Provide Philosophical/Ethical Foundation
• Encourage Sustainability
• Foster Interoperability
• Provide a Migration Path
17. Preserving Digital Information
• Advantage: Digital information may be copied
without degradation
• Disadvantage: Digital information requires
active management in order to remain
accessible
18. Risks of Digital Information:
Bit Loss
• Degradation of physical media
– Optical, magnetic
• Damage or theft of physical media
• Media obsolescence
– Ability to read physical media
– Ability to read logical media format
19. Risks of Digital Information:
Semantic Loss
• Even if the bits are intact, can a file still be
understood?
• File format obsolescence
• Loss of context
– Insufficient metadata
20. Risks of Digital Information:
Integrity
• How do we know whether or not information
has been altered, whether intentionally or
unintentionally?
21. Methods of Mitigating Risks
• Migration
– Migration of data to new physical media
– Migration of data to new file formats
• Replication
– Multiple copies of data in multiple locations
• Validation
– Retain checksums for files, routinely retrieve
files and compare against checksums
22. Scaling Digital Preservation
• Migration, replication, and validation require:
– Automated processes
– Ongoing monitoring, management, and
planning
– Ongoing funding for technology refresh
23. Digital Repositories
• Centrally-managed systems for storage (and
delivery) of digital information
• Leverage economies of scale for storage and
management costs
• Support preservation integrity functions
(migration, replication, validation)
• Much easier to manage than many little pockets
of digital information
24. OAIS:
Open Archival Information System
• ISO Standard 14721:2003
• Origins in space science community
• Conceptual framework for an archival system
dedicated to preserving and maintaining access
to digital information over the long term
• Basis for much work on digital preservation
within the library and archive community
26. Preservation Packages in OAIS
• Preservation package
– Digital content plus metadata
• SIP: Submission Information Package
• AIP: Archival Information Package
• DIP: Dissemination Package
27. From OAIS to Trusted Digital
Repositories
• 2002 OCLC-RLG task force report:
– Trusted Digital Repositories: Attributes and Responsibilitie
• What are the attributes of a trusted repository?
– OAIS compliance
– Administrative responsibility
– Organizational viability
– Financial sustainability
– System security
– Procedural accountability
28. Trusted Digital Repositories:
Auditing and Certiciation
• Digital Repository Audit Method Based on Risk
Assessment (DRAMBORA)
– http://www.repositoryaudit.edu/
• Trustworthy Repositories Audit & Certification
(TRAC): Criteria and Checklist
– OCLC/NARA/CRL report
– http://www.crl.edu/PDF/trac.pdf
29. Archives of Traditional Music
• Established 1948
• 110,000 recordings
• 1890s to present
• Field—30%
• World music traditions
• Endangered/extinct world languages
30. Sound Directions
Digital Preservation and Access for Global Audio Heritage
• Collaboration between Harvard University and
Indiana University
• Phase 1 an R&D project funded by NEH
• Focus on preservation
31. Sound Directions
Digital Preservation and Access for Global Audio Heritage
Project Partners
• Archives of Traditional Music, Indiana University
• Archive of World Music, Harvard University
• Harvard College Library Audio Preservation Services
• Digital Library Program, Indiana University
• Office for Information Systems, Harvard University
32. Sound Directions
Digital Preservation and Access for Global Audio Heritage
Objectives
• Research best practices in areas without standards or
best practices
• Develop best practices to meet existing and emerging
standards
• Test existing and emerging standards/best practices
with a real world project
33. Sound Directions
Digital Preservation and Access for Global Audio Heritage
Results
Publication—
Sound Directions: Best Practices for Audio Preservation
Development of audio preservation system
Software tools
Preservation of field collections
34.
35. Sound Directions
Digital Preservation and Access for Global Audio Heritage
Project Future
• “Preservation” Phase funded by NEH
• Increase throughput
• Simultaneous transfer
• Indiana automation
• Release ATMC
• Develop new access system for field collections
36. Migration
decision
Workflow
management
Workflow management / scheduling
Cleaning or physical restoration as needed
System / Project
Planning & Development
Funding
Personnel / Vendor
Equipment
Software Tools
Creation / maintenance of
software and scripts
Selection for Preservation
Assess research value
Evaluate condition
Consider political, technical,
and other issues
Establish priorities
Digitization
Analog playback
A/D conversion
Creation of Preservation
Master Files
Local filenames
Digitization
Technical metadata
Structural metadata
Checksums
Quality control
Local storage solution
Post-Transfer Processing
Quality control
Generation of derivatives
Marking areas of interest in
files
Signal processing
(if appropriate)
Preliminary Work /
Pilot Project
Exploratory transfers and
metadata collection
Quality control
Reassessment of
digitization plan
Collection Setup
Gather and assess documentation
Evaluate collection needs / condition
Assess cataloging / descriptive
metadata issues
Develop digitization plan
Assess and calibrate equipment
Ingestion into / Copy to
Long-Term Storage
Solution
Preservation packages
Periodic Evaluation
Data integrity checking
Format obsolescence
analysis
Migration
New carrier
New format
37. Common sense definition of a system:
• Set of interacting units or elements
• Forms an integrated whole
• Performs a function
38. A few basic principles…
• Each element/part affects the whole
• Whole is greater than sum of parts
• Inputs and outputs
• Equifinality
39. What should we preserve?
Selection for Preservation
• Analysis of research value
• Evaluation of preservation condition and risk
45. Where should preservation
work be done?
• In-house or outsource?
• Issues: studio space, technical expertise,
amount of work, future location of expertise
• Critical listening spaces
• Development of preservation studio
46.
47. Who should do preservation transfer
work?
• Audio engineer
• Importance of analog playback stage
• Audio examples
48. Who and Where Best Practices
• Use audio engineers in the workflow where
their skill is required
• Critical listening environment
• Use cleanest, most direct signal path to
converter
• Instant comparison from playback machine and
post A/D converter
• Test/calibration chain
49. What is the target preservation format?
• Digital file
• Broadcast Wave Format (BWF or BWAV)
Preservation involves a long-term responsibility to the
digital file
50. What do we look for in a file format?
• Disclosure
• Adoption
• Transparency
• Self-documentation
• External dependencies
• Impact of patents
• Technical protection mechanisms
http://www.digitalpreservation.gov/formats/sustain/sustain.shtml
51. Broadcast Wave Format
• Audio file format based on .wav files
• EBU 1996 for the exchange of files
• Non-proprietary
• Recommended by IASA, AES, NARAS, Sound
Directions for preservation
• “Chunk” for metadata residing with the file
• Time stamp
52. Broadcast Wave Format
Metadata elements include:
• Description of the sound sequence
• Name of the originator
• Date/time
• Coding history (signal chain components)
• Format independent, sample accurate time
stamp
• “Catastrophic” metadata
53.
54. How do we define the files we create?
• What is in them?
• How are they created?
• What do they represent?
55. Preservation (Archival) Master Files
Best Practice Documents
• Unmodified
• No subjective alterations or improvements
• Preserve history, not re-write it
• As true to the original source as possible
56. Preservation (Archival) Master Files
• Complete, unaltered stream from playback
machine
• Carrier of raw material from transfer
• No editing, signal processing, data reduction,
gain manipulation, announcements (slates)
• 24 bit, 96 kHz
57. Preservation (Archival) Master Files
Best Practices
• Define purpose of every digital file
• Written guidelines on characteristics of files
• Written guidelines on “technical” and content
edits
• Maintain common reference timeline
58. Data Integrity
Data integrity checking
“Checksums”
MD5 hash or algorithm
A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav
59. Data Integrity
• All files with enduring value
• As soon as possible
• Critical metadata stored in database and in
preservation package
• Verify before trusting
A7F1DAD8A7BF5E88EF44495E19683B18 *atm_01007_cass6936_010101_pres_20080228.wav
60. How do we make the preserved content
understandable and manageable?
• Descriptive Metadata
• Administrative—Technical Metadata
• Administrative—Digital Provenance
• Administrative—Rights Management
• Structural Metadata
61. Audio Technical Metadata Collector
(ATMC)
• Enter/edit technical and structural metadata
• Audio object and process history metadata
• Enter/edit audio object evaluations
• Parse files to collect metadata
62.
63.
64.
65.
66.
67. Quality Control and Assurance
• Quality control vs. quality assurance
• QC at ATM: aural, visual, software tools
• Collection setup—preliminary transfers
• Role of permanent staff
• QA at ATM
68. How do we store the data immediately
after capture?
• Local, interim storage
• Backup copies at each stage
• ATM NAS
• Additional redundant copy
69. Director
Project Development
Selection for Preservation
Archivist
Selection
Preview Collections
QC Documentation
Librarian
Cataloging Issues
Associate Director
Project Management
Selection—Format Issues
Scheduling Coordination
QC
R&D
Audio Engineer
Preservation Transfer
Preservation Master Files
Technical MD Collection
Checksums
BWAV MD
ADL’s
Signal Processing
Project Assistant
Content Division
Production Masters
QC
ADL’s
Workflow Management
Collection Setup
Ingestion Process
Programmer
Software/Script
Development
Digital Library Program
Preservation Repository Services
Deliverables
Access System
71. What is metadata?
• “The stuff we need to know in order to discover
and manage data over the long term”
• Here’s a better definition:
“Metadata is structured information that
describes, explains, locates, or otherwise
makes it easier to retrieve, use, or manage
an information resource.”
NISO. “Understanding Metadata.” 2004.
<http://www.niso.org/standards/resources/UnderstandingMetadata.pdf>
72. Metadata standards
• Standards define mutually agreed-upon:
– Definitions of key terms
– “Fields” of data to record
– Rules for structuring data in these fields
• In this area, generally expressed in XML
• Allow us to benefit from community experience
• Promote preservation by providing for more
predictable data
73. Evaluating metadata standards
• Good fit for the type of material I have?
• Supports my access/management/preservation
needs?
• Are there existing tools to help me create it?
• Has it been used before in similar situations?
• Who maintains it?
• How quickly are the standards in this
environment changing?
74. Creating metadata
• Generally not done by humans encoding data
directly in the storage format
• Instead:
– Humans use tools designed for specific
purposes
– Derived computationally from the digital
resource itself
75. Technical metadata
• Tracks properties of a digital file necessary for
its rendering and processing
• Can also include data about the circumstances
of creation of a digital file
• Often format- or media-specific
• Much can be generated automatically from
digital file
76. Digital provenance metadata
• Tracks the history of a set of related digital files
– Can include the methodology by which the
“master” file was created from an analog
source (overlap with technical metadata)
– What transformative processes have been
applied to the file
– Relationship of “derivative” files to the
“master”
77. Structural metadata
• Documents relationships within and between
digital files
– Locating the same intellectual content on
multiple representations
– Noting points of interest within a single
resource
– Grouping and sequencing multiple files that
make up a logical whole
78. Rights metadata
• Covers legal, moral/ethical, financial rights over
resources
– Rights holders
– Copyright status
– Conditions on access
– Usage fees/royalty payments
• Can be in human- or machine-readable format
79. Descriptive metadata
• Like “cataloging”
• Allows users and collection managers to find
and identify resources of interest
• Factual information such as creator, date
created, running time (overlap with technical
metadata)
• Constructed information such as title
• Subjective information such as topic, genre
80. Preservation metadata
• Some overlap with technical and process
history metadata
• Catch-all for all the metadata we need to
support the preservation process that’s not
recorded elsewhere
• Most important feature: tracking events that
occur during the preservation process
82. Types of preservation packages
• According to OAIS:
– Submission information package (SIP)
– Archival information package (AIP)
– Dissemination information package (DIP)
• The AIP is what is stored (potentially broken up
into pieces) in the IU repository
• Metadata Encoding and Transmission Standard
(METS) used to wrap various pieces together
86. Structural metadata (1)
Audio Engineering Society, Audio
Decision List. AES 31-3
and
Metadata Encoding and Transmisson
Standard (METS), <structMap> section
87. Structural metadata (2)
Audio Engineering Society, Audio
Decision List. AES 31-3
and
Metadata Encoding and Transmisson
Standard (METS), <structMap> section
88. Rights metadata
• For field audio collections, the ATM knows:
– Collector
– Terms of deposit governing access
• This area still under develop for the IU
repository
• No decision yet on metadata format; need more
thorough analysis of the functions this metadata
needs to support
91. Preservation metadata
• Still under investigation for IU repository, for all
formats of material
• Will need to implement before any preservation
events occur
• Will likely use PReservation Metadata
Implementation Strategies (PREMIS) data
dictionaries and schema
92. Need to share
• Copies in multiple repositories can help ensure
preservation
• Sound Directions did a test exchange of content
between IU and Harvard
– Different repository architectures
– Different preservation package structures
• ...demonstrated how different levels of
preservation are possible
93. Two Repositories Supported by the
Digital Library Program
• IUScholarWorks Repository
– “Institutional Repository”
• For preserving and providing access to IU’s
research output: articles, papers, etc.
– Based on DSpace software
• IU Digital Library Repository
– General-purpose digital content repository
– Based on Fedora software
94. Fedora
• Flexible Extensible Digital Object
Repository Architecture
• Open source digital repository software
developed by Cornell and the University of
Virginia
• Supported by new organization:
Fedora Commons
• Basis for IU Digital Library Repository
95. Moving Content to a Digital
Repository – Idealized Workflow
Master audio files
in MDSS
Delivery audio
files on streaming server
Metadata records on
disk
ATMC/Audio
Workstation Upload
preservation
package
Temporary
Server Disk
Storage
Fedora
Repository
Validate
and ingest
96. IU Massive Data Storage System
(MDSS)
• Hierarchical storage management
– Some storage on hard disks
– Much more storage on automated tape
• Managed by UITS Research Technologies
• Servers in Bloomington and Indianapolis
connected via I-Light high-speed fiber link
• Total capacity: 2+ petabytes
• Need to build Fedora-MDSS connection
97. Repository Status
• Fedora is running in production
– Supporting access to image and text
collections
– Experiments with loading audio and video
• Need to improve tools for ingest and retrieval to
support audio projects
• Not yet a true preservation repository
98. Toward a Preservation Repository
• Need to add:
– File integrity validation
– Integration with MDSS – replication of data
– Eventually, file format obsolescence
monitoring and migration
• Self-audit and/or external certification as
Trusted Digital Repository
– DRAMBORA, TRAC
99. Access Systems
• Variations2
– variations2.indiana.edu
– Provides access to cataloged commercial
recordings from the Music Library and ATM
• Need access system to provide discovery and
delivery of field collections and other types of
archival audio collections