Facing Up to ARMA-geddon: Preserving Cultural Records in the Digital AgeARMA-CT, New Haven, CT. May 22, 2013Greg Colati, University of ConnecticutNOTE: The presentation was far more informal than this text indicates, with lots of discussionthroughout. In fact, while all of the main ideas contained here were discussed, very little of whatI said followed this script. I make it available here for those of you who might be interested in amore orderly development of the ideas that I presented.Slide 1I want to thank the organizers for giving me a few moments to speak about sustaining digitalcontent over the long term. It is a topic about which archivists and records managers aresometimes seen as looking at from opposite sides of the same coin. But I don’t think that isnecessarily true. At the highest level, we have common interests. I will speak tonight from myperspective as an archivist, and I hope it will resonate a bit with your vision of the world. And Isincerely hope it begins a dialog tonight and beyond about the commonalities of our two relatedprofessions.For centuries, the desire to document human activity and culture was hampered by the lack ofrecords. Governments and other official entities controlled information, and the historicalrecord was filled with the stories of the privileged. Underdocumented communities had no voiceand made few appearances in the story of human history.Over time, technology has made it more possible for unofficial voices to be heard and to endure.The printing press, home photography, wood pulp paper, the phonograph , photocopier,videotape, and digital cameras all have increased the output and convenience of informationcreation and distribution. Since the arrival of the Internet, overabundance has replaced scarcityas the primary challenge to preserving and making sense of the cultural record.With so much being generated, how do we decide what to preserve, how to we insure that it willendure, and how can we help people of the future make sense if it?Slide 2From the beginning of recorded information, culture has depended on the sustainability andauthenticity of the historical record. Myriad dystopian novels and films based on the distortionand manipulation of the historical record echo George Orwell’s oft quoted “Who controls thepast, controls the future. Who controls the present controls the past.”However, it may be true that as much if not more culture has been lost through the moremundane reasons of neglect, inefficiency and apathy than purposeful destruction for nefariousgoals. And the principle culprit has been our own culture and human attitudes. The very thingsthat make a modern free society possible are the things that make it difficult to document a free
society. Decentralization, the lack of controls, the ability to think and do as you wish. This hasperhaps never been truer than it is today, when everyone can be a publisher, a chronicler, and adocumentarian with the click of a mouse. Yet those publications, chronicles, and documentsexist in an environment that is potentially a more ephemeral recording medium than at any timesince the invention of writing.Slide 3We can fast forward through the technological development of record keeping and culturaldocumentation with two thoughts in mind. In nearly every case, technology advances over thecenturies expand the ability of people to create and distribute their own records. And given thechoice, society always chooses a recording means which is most convenient over that which ismost permanent.Some years ago, Paul Conway, now at the University of Michigan created what has become amuch reproduced graphic representation (or what we would now call a data visualization) ofwhat he called the Dilemma of Modern Media—this was before cell phone cameras, Facebook,and YouTube—that shows the inverse relationship between information density of recordedmedia and the longevity of that media. Papyrus replaces clay tablets, scrolls replace papyrus,the Codex—by the way the first random access memory device—replaces the scroll. On throughmoveable type, wood pulp paper, newsprint, microfilm, compact disks and other optical media,and by extension to the cloud services that are nearly ubiquitous today.But as we see in the graph, as early as the end of the 19th century, the medium of the historicalrecord began to become as much of an issue as the contents of the record itself.As long as history was recorded with scratch marks on a physical medium or to a lesser extentphotographs on glass or film, it was not only permanent, but interpretable with the human eye.Slide 4The widespread use of “coded” information transfer began to change all that. Informationtransfer that required intervening technology—whether it be a telegraph operator or a opticaldrive—becomes inaccessible to the average human without the reading or interpreting device.This occurs perhaps as early as the telegraph, and certainly in popular culture by the time of thewax cylinder. I cannot simply look at a wax cylinder of a spoken word recording and “read” thewords on it. But even the wax cylinder and the flat LP disk are analog renderings of actual soundwaves. The advent of magnetic, optical, and digital media further changed the landscape. Noweven the written word was subject to an intervening technology, and the permanence of thedocumentary record was subject not only to the vagaries of humidity and temperature, but alsomuch more to the marketplace of technology.Slide 5: Cultural Armageddon, 1980s: Brittle Books
Even so the primacy of print persisted well into the 20th Century to the point where, by the1980s the crisis of cultural documentation was brittle books, and the solution was “massdeacidification” and the advent of mass produced, acid-free paper as the “permanent” solutionto the crisis of preserving the cultural record of the printed word.Slide 6 Cultural Armageddon 1990s Media ObsolescenceAt the same time, but much less noticed, typewriters begin to give way to word processors, filmto videotape and then digital recording. By the 1990s the historical record became dependentupon the recording medium as never before. Marshal McLuhan famously said that “the mediumis the message” For archivists, he could have said that the “Medium of the medium is themessage.” By the 1990s we were looking at a NEW cultural Armageddon of media obsolescence.Mass migration/emulation begins to replace mass deacidification in an attempt to preserve thecultural record of the early Information Age. We began to be faced with a questions about whatexactly was the “record?”For example, early in my career, archivists spent a good bit of time determining the “recordcopy” of archival records, tracing “originality” to something called the “ribbon copy” of a letteror document. Ribbon copy being of course the piece of paper that had come into contact with thetypewriter ribbon, rather than a carbon copy or even a xerographic reproduction. Today theconcept of originality is much less clear, as every copy of a digital file is in some respects anidentical twin of every other copy, and the viewing experience of anyone interacting with theinformation in that digital file is dependent not only on the characteristics of the file itself, but ofthe viewing environment of any one particular user. This question of originality and authenticityis out of our scope today, but is nonetheless an important topic of conversation.Digital content preserved on high quality carrier media was the standard solution for the time,and migration to new forms of carriers was the permanent solution to the problemSlide 7 Cultural Armageddon 2000s File format obsolescence and the Digital Dark AgeBy the turn of the current century, there were fears that the ephemerality of not just media, butdigital file formats and media would lead to a “Digital Dark Age” when obsolete media and theinability of modern equipment to interpret old formats would render mute the voices of thecomputer age. Again, thanks to the work of archivists, computer scientists, librarians and manyothers, these fears were shown to be largely unfounded. File format normalization andidentification of “archival” or sustainable file formats was the NEW permanent solution.While we continue to lose the historical record at an alarming rate for traditional reasons likenatural disasters, societal collapse, and media obsolescence, we lose much less of it due fileformat obsolescence.These solutions addressed what was and is a backward looking problem: How do we sustainaccess to scarce information resources? It did not prepare us to address the next great challengeto preserving human culture
Slide 8: Content overload: the NEW cultural ArmageddonThe new cultural Armageddon is not how we can sustain access to scarce information, but howwe can collect, manage, and make sense of the explosion of information to be collected. Not longago the BBC estimated that by 2007, 94% of stored information was in digital form and the totalamount of stored information at that time was in excess of 295 exabytes. Researchers at theUniversity of Southern California estimated that the sheer quantity of digital information hasincreased exponentially in the last 25 years, and shows no signs of slowing.An Exabyte is one billion gigabytes, 1000 petabytes or 10 followed by 18 zeros.Slide 9More recently, the digital universe report estimated that in 2011 alone more than 1.8 Zettabytes(1 ZB=1000Exabytes) were created, and that in the near future the number of files created willincrease by a factor of 75. But most importantly it found that 75% of all that information will becreated by INDIVIDUALS, not associated with any formal or organizational recordsmanagement system. And even more significant, much of this information will be stored insystems that are NOT dedicated to preservation or recordkeeping.Slide 10MANY of those will use 3rd party web-based applications that have EULAs that are seldom reador understood by users of those systems.Slide 11 Digital AtticOthers, will simply not bother to clean out the digital attic.Slide 12 Life DocumentedBeyond this, the technology available makes it potentially possible to record virtually everything,from lifelogging to lifecasting, we can all be part of the 24/7 social community.Slide 13 Google GlassIt just keeps getting easier and easier to record, every look, every activity, either in person…Slide 14
…or remotelyIn the sense of information creation, we have truly crossed the Digital Divide, and there is nogoing back.Slide 15More information that we need?Slide 16 LifecyclesI think there is at least one thing we can take away from this: If we are not able to select,manage and preserve digital content, we are missing out on more than 90% of the material thatcould constitute the historical record.And THAT is a lot more than is being lost through any other means. Archivists have adopted theidea of the curation lifecycle—something that records managers should find familiar—to thinkabout how to deal with the mountains of digital content. It is not much different than what wehave always done, just with better graphics.The way forward is becoming clearer however. Rather than developing strategies for managingdifferent material types, carrier media, and file formats, we are beginning to think in terms ofseparating the informational content from its storage and even delivery medium.Slide 17We want to think of these cultural artifacts less as objects and more as data. Data that ismaintained in a way that makes it possible to be used, arranged, and rearranged to tell stories.Slide 18 Five EquationsThe flood of information is just one of the challenges of the current age. What users expect fromus also drives what we do and how we do it. The desire for permanence is just one of the factors.This evolution in content creation is coupled with changing attitudes toward research.Researchers and archivists are applying sophisticated tools and applications to the digitalobjects that are seen as pieces of data: Data, in an archival context, is any information suitablefor manipulation, use, or reuse in an electronic environment. This includes metadata, which isthe “sum total of anything we know about an object” as well as digital objects themselves, which,by their binary nature are inherently data.Combining our collections into aggregations of data, that can be manipulated, used, and reused,without losing their authenticity is a way to transform our digital objects in to useful andvaluable data.
Visualizations are modern ways to tell stories, that are not all that different from traditionalstorytelling, they just use information in a different way. If primary sources are the raw materialof storytelling, our primary sources must support modern visualization.Slide 19:The ephemerality of digital archives and the systems that manage and deliver them pose adilemma for scholarship that is based on citation. How do we insure that something I cite todayis going to be there tomorrow?Slide 20 Digital RepositoriesAs we have seen, the Humanities community is has embraced digital scholarship. And like anyscholarship, digital humanities scholarship is dependent on the availability of the resources. Weknow that today’s resources require not only an intervening technology to experience, but anintervening technology infrastructure--that we call cyberinfrastructure--to make it possible forscholars to interact with our data and to turn it into stories. Collaborating with our partners onthe other side of the reference desk is a way for us to help them tell their stories.Slides 21, 22, 23, 24: Use and ReuseIf all of our content has become data, our mission and activities nevertheless remain the same,even if our tools are changing. We continue to appraise, collect, and contextualize, and makeavailable our collections in ways suitable to each and all of our communities of interest.Slide 25: Four “-Itys”With the evolution of the material in our care, the tools required to manage them have alsoevolved. Right now, these tools mean systems that can support preservation activities like errorchecking for detecting “bit rot,” multiple redundant storage arrays, automated extraction oftechnical metadata that can be used to plan format migration. Creating and maintain systems topreserve digital assets is expensive, and usually beyond the reach all but the largestorganizations. However, in aggregate, the so called long tail of small to medium organizationsprobably contains more historical content than all but the very biggest collections.Slide 26
If we can collaborate to build a cyberinfrastructure for digital culture in Connecticut we willaccomplish a number of things together that we cannot do alone, no matter what funding wasavailable:support sustainability of digital assets for allcreate coherent and managed digital collections that are comprehensive rather than exemplaryreflect a commitment to share digital assets on a fair and equitable basis for everyoneThis has a number of advantages:The University of Connecticut, along with the Connecticut State library and other culturalheritage organization in Connecticut are working together to make it possible to connect theassets of even the smallest historical society with colleagues, scholars, and enthusiasts not onlylocally but globally.The Connecticut Digital Archive will aggregate, not only access derivatives, but digital mastersfor preservation from cultural heritage organizations based in CT.Contributing to a content aggregator like the CTDA will make it possible to connect to evenlarger aggregators like the Digital Public Library of America.Collaborative digital preservation works to sustain Connecticut’s digital heritage, because itmakes it possible for each organization to prove its worth and sustain its own collections.Secondarily, but perhaps more importantly on a larger sense, it supports a community ofknowledge that is larger than any one organization.Slide 27We have the opportunity, today, to do something that many of us in this room have dreamedabout. The reality of course, will be imperfect, the details will be messy, and progress will seemglacial at times. As an archivist, I see that as standard operations, and it is not daunting. Fromthe first archives of clay tablets, to the digital repositories of the future, we are part of a long andrespected tradition. We have solved so many other challenges of preserving and makingavailable the historical record, this is just the next one, and the one we have been given in ourlifetime.