2. Fixity and Checksums
● Fixity refers to the property of a digital file / object being fixed,
or unchanged
● Checksums are a means for determining fixity, or the sameness
between two copies of a digital object or the same object at
different times, and/or before and after certain events
● Checksums feature prominently in digital preservation
● Checksum-based fixity checking workflows are commonly used by
libraries and archives
3. IT use cases
● 5 years at ND, 3 roles
○ Strategy development for digital asset management (including digital archiving
and preservation, in collaboration with the ND Libraries and Archives)
○ Implementation of enterprise Digital Asset Management (DAM) service
○ Now an Enterprise Architect
● Verify software packages before distributing to many endpoints
● Data migration - correctness
● Mostly rely on built-in fault-tolerance features of storage systems
or data management systems for data storage and transfer
integrity
○ Isilon, Globus
4. Digital Preservation risks
● Bit flip is not a top reason for
data loss
● “The real culprits are a
combination of human error,
viruses, bugs in application
software, and malicious
employees or intruders. Almost
everyone has accidentally
erased or overwritten a file. “
● 13 threats in Requirements for
Digital Preservation Systems: A
Bottom-Up Approach
5. A matter of where to spend energy
● Format migration - another prominently featured digital preservation task, on which our thoughts have evolved
○ relaxed approach to format obsolescence - preserving the bits and dealing with format obsolescence if and
when it happens
● We still need to preserve the bits - but does this mean we need to run checksums on everything by ourselves?
○ Can we make use of hash of data stored by Cloud services - trust and egress
○ Pass hash values with upload
● More importantly, what other things have overlooked that can lead to data loss
○ Not collecting content
○ Dependency on storage technology (storage intermediary) that
■ Challenges the notion of “redundancy” or “lots of copies”
■ Introducing single point of failure
■ Holds the unique knowledge of file to object mapping
■ Hidden and invisible files could be added to the storage location inadvertently
● Scalability and automation
● Work together!