8. COMPACT DISC
• Developed in 1982
• Storage capacity –
• 650-700 MB
• 70,000 .doc files
• 140 minutes of low-
resolution video
• Machine readable
9. SOLID STATE STORAGE DEVICES
• Developed in 1999
• Storage capacity – varies
• 64MB – 8TB
• Data stored in low
voltage switches,
transistors
• Machine readable
10. CLOUD STORAGE
• Web based data storage,
2006
• Data stored in datacenters
on magnetic hard disk
drives
• Storage capacity –
• Potentially unbounded,
limited by power
consumption and physical
space
• Infrastructure dependent
Council Bluffs, Iowa – Data Centers. (n.d.).
https://www.google.com/about/datacenters/locations/council-
bluffs/
11. WHAT IS DIGITAL PRESERVATION?
… series of managed activities necessary to ensure
continued access to digital materials for as long as
necessary.
Digital Preservation Handbook: Glossary. (2020). Retrieved December 03, 2020, from
https://www.dpconline.org/handbook/glossary
12. DIGITAL PRESERVATION RISKS
1.Physical loss of the data
object
2.Losing the means to
interpret the data object
into meaningful,
authentic information.
13. MEDIA DECAY
Shahani, C. J., Manns, B., & Youket, M. (n.d.).
LONGEVITY OF CD MEDIA RESEARCH AT THE LIBRARY OF
CONGRESS.
https://www.loc.gov/preservation/resources/rt/studyofCD
longevity.pdf
Physical preservation
Data Migration
16. TECHNOLOGICAL OBSOLESCENCE
By Ramon Vasconcellos, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=90795401
Fixity Checks
Multiple Backups
Physical preservation
Data Migration
Hardware Redundancy
Legacy access
17. AUTHENTICITY OR LACK OF AUDIT
Friedman, U. (2013, October 30). Floating Chinese Officials: A History of Badly Doctored Photos. Retrieved December
08, 2020, from https://www.theatlantic.com/international/archive/2011/07/floating-chinese-officials-history-badly-
doctored-photos/352390/
Fixity Checks
Multiple Backups
Physical preservation
Data Migration
Hardware Redundancy
Legacy access
Automated Audit Logs
Digital Preservation Service
18. A SERIES OF MANAGED ACTIVITIES
Memory of Mankind Project - https://www.memory-of-mankind.com
Fixity Checks
Multiple Backups
Physical preservation
Data Migration
Hardware Redundancy
Legacy access
Automated Audit Logs
Digital Preservation service
19. RESOURCES
• Digital Preservation Coalition -
https://www.dpconline.org/
• National Digital Stewardship Alliance -
https://ndsa.org/
• KAUST LIBRARY LIbGuide -
https://libguides.kaust.edu.sa/Digital_Preservation
20. Pasquini, A. (2013). Di-Rotta - Alice - Google Arts & Culture. https://artsandculture.google.com/asset/MAGWXekj3tsx0g
Questions?
Editor's Notes
Good afternoon, thank you for joining me.
Today I will be giving a brief introduction to digital preservation. About 20-30 minutes and we will have time for questions afterward.
Before we fully delve into what we mean by digital preservation I want to give some context and talk about the history of information objects
Our society is digital now but we used to capture, share and preserve information more in in physical space, or at least physical space that is a bit more meaningful to humans. Digital objects are still physical, it’s 1s and 0s exist in the world but that is very difficult for us to see or perhaps imagine. And actually one of the fundamental risks to digital objects in the lost or destruction of their physical manifestation.
After we have a look and understand what we mean by information objects, including digital objects, I’ll define digital preservation.
Risks to digital objects and what can we do to mitigate these risks.
Alright, lets start way back, this is an ancient Sumerian cuneiform tablet.
It’s a bit more than 4,000 years old, found in modern Iraq - Many of these tablets detail the arrival of livestock received from individuals (an early form of taxation), and it records which of these animals were distributed to members the royal court, or to temples for sacrifice, or to the army for food.
As these objects are more than 4,000 years old they fall under the category, Great for preservation! – we can easily(if we understand ancient Sumerian cuniform) but lacking in information storage capacity. Only 36 characters per side or 36 bytes. The size of a pretty average Tweet. Average tweet 34 characters. Tablets themselves are pretty small, the size of a large coin, thick coin. But you would still need a lot of them to convey extensive information.
Human readable, we touch on this with each information object.
A little bit further on, a 500 year old book.
A printed book, one of the first mass produced books in Europe. As technology progresses machines enables us to store more information, store it more accurately, and transmit it more quickly. Our first step in this direction is the printing press. a We used machines to help us make this book but it’s still meant to be read by humans.
Books are an amazingly resilient technology. There form has changed very little over the last 500 years and they are still widely in use. Pretty resilient in terms of preservation as well. Keep them cool and dry and they can last quite a bit of time. Most mass produced books will last 40-50 years if kept out of the sun in a cool, dry environment. Books printed on permanent acid free paper with special pigment-based inks can last considerably longer. With little action taking place to preserve the book.
A big jump in information storage capacity from the stone tablet. For context, it would take 8,931 stone tablets to equal the information in this book. I know those tablets are small but it would take some pretty big pockets to hold nearly 9,000 of them. That brings to mind another piece of great technology, book binding! Can you image sorting almost 9,000 tablets into the correct order to read them? It sounds like a pretty long and boring performance art piece. It’s still human readable. Notice the nice artwork, meant to be read and enjoyed by humans.
Ok, moving right along, IBM punch card, our first machine readable information object.
Herman Hollerith invented a tabulation machine that was used in the 1890 U.S. census to help track the results. The image shown is actually the decoding master card, the tabulation cards themselves were blank but correspond to the alpha numeric codes you see here on the mastercard. How it worked, the gathered census information was entered into the tabulation machine which had a number of dials and gauges so they could keep track of the results over time.
You’ll notice that the information storage capacity in not impressive. But encoded information is the prototype for forthcoming digital objects.
A big tradeoff though with machine readable technology is you need a working machine to render the information for us to understand. No machine, or say the decoding master is lost or destroyed and there are no backups, the data is still there, cards with holes in them, but no way to tell what they mean.
This basically means that if the material can be manifested by a bitstream, and appears to the user as a digital file, it can be thought of as a digital object. Some digital objects can be simple, like a text file. Video, being composed of multiple elements (video track, audio track, container file and possibly others) may be considered a complex digital object.
OAIS - Open Archival Information System developed by the Consultative Committee for Space Data Systems
Our first digital object. The Universal Automatic Computer was used to help tabulate the 1950 US Census. One of the first civilian electronic computers.
Census data was tabulated from punch card using vacuum tubes and some of the world’s first transistors. Data was stored on magnetic tape.
Machine readable, but again you need a working UNIVAC or equivalent to read the data.
Storage capacity of the tape does not yet equal the Gutenburg bible, but tapes were easy to mass produce and pretty compact to store so you could use a lot of them. Even today we still use magnetic tapes for slow access data storage.
We’ll discuss media failure more in a bit but, magnetic tape suffers from something called sticky shed syndrome. Magnetic tapes, especially those produced in the 1970’s, have an issue where the tape begins to come apart and shed both the binder and the magnetic side. Debris gets in the machine reading heads. Physical data loss.
The CD. This is actually the first CD I owned, given to me by a friend on my soccer team.
A massive jump in data storage capacity, 1 CD can hold 109 Gutenburg printed books. Machine readable.
Now all digital object a format, as simple as a text file or as complex as a video file, all digital objects have a set of instructions at their beginning to instruct computer on how to render them. A PDF and an Mp3 instruct the computer on how to interpret and display their 1s and 0s so we either get a recipe for cookies or the classic hit 90’s Informer by Mike Snow.
As we’ll see we talk about risks to digital objects, CD are a media that do not do well with the passage of time.
Okay, we are nearing our current time and I’m sure almost all of you have multiple flash drives or solid state drives at work and home. They hold our research, our work projects, our kids photos and videos, our music collections, books, movies, video games… all sorts of digital objects.
Capacity varies widely is solid state drives, from small usb sticks to large external drives. For context though, an 8Tb drive can contains as much information as 120 billion Sumerian tablets, or 13 million Gutenburg printed books, or 12,000 CDs.
Machine readable, but it depends on what type of machine the drive is formatted to be read by.
Okay, our last stop on our information storage object journey. Again I’m sure most of us use some form of cloud storage for work or personal files, and probably both.
Storage capacity is only limited by our ability to generate power for server farms and find space to locate them.
It’s potentially unlimited storage but it comes as a cost. A heavy reliance on power and networks infrastructures. With out a working power grid and online networks their ability and functionally of cloud storage to provide us access to data is severely limited.
Books might be heavy and not hold a ton of information but all you need is light and eyes to read them.
So the key to understanding digital preservation is we need to support a number of systems and perform actions to ensure that digital objects are maintained, accessible, readable, and authentic.
Preserving digital objects is complex for few reasons:
To maintain them we must maintain both physically and maintain means to access their data.
We cannot read them without the aid of computers.
To ensure they are accessible we must monitor their formats and software environment to ensure we can continue to access them
Digital files can become corrupted for a number of reasons, and because of they are easy to copy and move around it can be difficult to determine the authenticity of a digital file.
Now that we have defined digital preservation let’s talk about risks to digital ojects. All digital preservation risks fall under two fundamental categories
1. Powerful magnets scrabbling flash drives, scratched cds, flood in your server room.
2. Obsolete file formats, network failures, a good analog example is the tabulation machine cards, you may have the blank cards holes punched out representing the data but without a way to render that data it’s locked away and potentially lost.
Physical Decay of storage media
All storage media and formats are susceptible, although some are more fragile than others.
See the CD, as you can see in image 1 and 2, and maybe you’ve seen this in real life, there are small specks on the CD we the metal layer were the digital bits are inscribe is disintegrating. In image 3, you’ll see that the dots and dashes are that represent these bit are distorted or completely gone.
What can we do?
1. Monitor media
2. Migration to preservation formats
What is bit rot? Bit rot is the gradual decay of storage media where the individual bits (1s and 0s) of digital files ‘flip’ leading to a corrupted or inaccessible file. This can be caused by dust, contaminates, background radiation and high heat.
What can we do?
Regular fixity checks
Multiple backups to enable restoration of corrupted files
Oh no, blue screen of death
Other forms of hardware - such as servers, drives and network components - are also susceptible to failure. At best this may cause a temporary interruption or degradation of operational capability; at worst it may cause temporary or permanent data loss.
My personal hardware failure story.
What can we do?
Hardware redundancies
Regular backups
Technology moves forward and sooner or later it’s replace and becomes obsolete, laserdisc, cd, floppy disks, or mini disks.
Minidisks like this one were put out to pasture by the development of mp3 players.
Storage media or the technology required to access it the media become obsolete, rendering the content inaccessible and lost.
Maintain legacy access systems
Data migration to current technologies
Which photo is the original? In this case it may be fairly easy to determine which file is the original but in some cases the edits to files may be very subtle. We don’t really have time to open the discuss to why files may or may not be edited and for what purpose. What we’ll focus on is how we maintain a sound audit trail to confirm the authenticity of digital objects by tracking any and all changes to the files.
How do we do this?
Automated audit logs
Centralized management of digital objects within an Archival Management System
Regular fixity checks
So What can we do?
Time is cruel, time cannot be overcome. Everything will decay. But just remember, it is important. The little objects that we create today, photos, documents, data, we are sending them into the future, plain and simple. And the actions that we discussed today are some of the things we can do to ensure that these objects travel far into the future.
But there is any alternative.
We’ve come full circle, Memory of Mankind project. Remember the Sumerian ceramic tablets. Well, there is a group using this ancient technology for preservation purposes.
Preserving 1000 books in a salt mine in Austria. The books are inscribe on ceramic tablets and ceramic microfilm. Preservation goal, 1 million years. The problem is this hardly encompasses the wealth of human knowledge that we have readily at our finger tips. In comparison to the books preserved in ceramic, digital preserved objects are bound to change much over the course of a million years. But if they can be effectively maintained and preserved it enables them to continue to contribute the growing trove of human knowledge. Surely a worthy endeavor.
Now, before we get to question I just wanted to share some resources.
Two organizations that are working very hard to promote and educate on how to advocate for, develop, and implement proper digital preservation.
I have also included a link to our KAUST Library Digital Preservation Libguide where you can find links to these resources and many more.