• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Do Digital Archivists Dream of Electronic Records

Do Digital Archivists Dream of Electronic Records



The information age has ushered in the biggest changes in human communication since the rise of printed text. The dynamic and ephemeral nature of electronic communication presents stark challenges to ...

The information age has ushered in the biggest changes in human communication since the rise of printed text. The dynamic and ephemeral nature of electronic communication presents stark challenges to the fundamental principles of the archival practice. Join us for a look at how the tradition of collecting and creating archives is facing this paradigm shift and how the historical record will be shaped for the future.



Total Views
Views on SlideShare
Embed Views



5 Embeds 26

http://dayofdigitalarchives.blogspot.com 14
https://twitter.com 5
http://dayofdigitalarchives.blogspot.co.uk 4
http://dayofdigitalarchives.blogspot.ca 2
http://www.dayofdigitalarchives.blogspot.kr 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Hi, I’m Gretchen Gueguen and I’m the Digital Archivist here at Uva in the Albert and Shirley Small Special Collections Library, just next door.Today, I’m going to talk about Archives: what they actually are, what those who work in them do, and how all of that is changing because of the information age we are now working in.The theme I’m referencing here is a science fiction work by Philip K. Dick called “Do Androids Dream of Electric Sheep.” I’ll admit that I originally thought of the title because it sounded cool and had no deeper motivation than that, but I think some of the themes of the novel and it’s film adaptation, Blade Runner, like authenticity and replication, really do have parallels to themes I’ll talk about. But, if you’ve never read the book or seen the movie, don’t worry. I mostly only make references to it in order to use some cool pictures
  • So, to start off with, let’s talk about what archives are
  • The Society of American Archivists gives this definition: Materials created or received by a person, family, or organization, public or private, in the conduct of their affairs and preserved because of the enduring value contained in the information they contain or as evidence of the functions and responsibilities of their creator, especially those materials maintained using the principles of provenance, original order, and collective control; permanent records.So, I’d like to point out that if you want to endear yourself to the snooty archivist in your life, you should never use the word “archive” and heaven forfend you use it as a verb! It isn’t. it’s a noun and it’s plural.But in seriousness, archives are different from their sister-in-crime the library. Archives exist to store the documentary record of life. While we do have many works that might also be in a library, or that were created as works to be read and studied, primarily we collect these as an artifact of the life of those who created them.
  • To this end, archival practice follows a couple of primary tenets:First, we follow the principle of “respect de fonds” which means that we organize collections according to who created or received them. Let me just get this out of the way…This is also called provenance. Generally, these materials belong together because of some organic reason: they are the output of an organization or a person’s life. We generally use the term “Archives” to describe the materials created and saved by an organization or person…so an archive repository is often part of a larger organization (the coca cola corporate archives, for example). Organizations that create collections by accepting or purchasing other materials, are more rightly called “manuscript repositories” not archives. There is some selection that goes on in an archives, but it’s done within the realm of the records of the organization. But in the real world, many organizations, like my own “library” both collect manuscripts and house the archives of the institution, the university in our case, as well as the records of other smaller organizations of interest that don’t have the capacity to house their own (we have, for example, the records of the Libertarian Party here). We apply respect de fonds to both. There are some implications of this choice, of course. Access by provenance necessarily means that the connection between a subject and the provenance of a collection must be understood. In this way, provenance is a means of protecting the authenticity and integrity of materials, but not necessarily an enhancement to access.We also try to retain the original order (if one exists) that the creator of the collection used. The original order is an important capturing of the context of the creation. The reason we do this is because it is crucial to understanding why the records exist. Take a record out of it’s context and it may have no meaning. A famous example come from the Iran Contra hearing. An email was written by John Poindester that said simply “Well done.” Within the context of the entire email discussion it was purported to show that he knew of Oliver North’s misleading of the House Intelligence committee. Without that context, it could mean anything. In this sense, I think Archivists are almost more similar to archeologists than librarians. We document and preserve context of documents though, not stuff in the ground.Along with this, archivists exercise collective control. This means that we manage the collection as a group, not as individual unrelated items. This is another big difference between the library and archive. We create our arrangement based on series or groupings of like materials, and then we write an overall description of them. Highly useful or important mateirals might get some more individualized attention, but generally the aggregate is important, again because of context.If you are interested in a more in-depth discussion of these principles, I’d suggest this article by Kate Theimer in the Journal of Digital Humanities. She
  • So what is really meant by context? It’s worth thinking about at least a little before we move on, because it’s sort of a loosely defined concept. The following three distinct definitions were put forward recently as a basis for developing conceptual models of archival description.The document’s place in a larger information landscape – so we could think of a copy of the novel as part of the genre of 20th science fictionThe objective or social environment in which the document existed – The novel also came out in 1968 and has a different meaning in the context of the social changes in the US at that time.The mental or physical state and identity of the creator of the document – finally, the novel occupies yet another context when considered within the collection of Dick’s other works.
  • This is an example of the traditional archival arrangement tool, the finding aid. It follows a very hierarchical model, and shows the importance of the archival principles. The finding aid is a single document describing the entire collection, which as you can see here was organized according to the family that produced it. The guide contains some overall description at the top of the entire thing, and then a very basic contents list, that is arranged hierarchically into series.This is of course, a representation of context, not actual context. The work of archives, libraries, and museums in many ways involves creating representations to aid in understanding. This representation unfortunately can never be identical to context and we have to accept that it might be biased or contain over-generalization or imposed order. The archivist obviously works to avoid this as much as possible.This representation can help to overcome the access bias of provenance-based order as described earlier. By making it a searchable text and providing access points such as subjects and name authorities, other aspects of the contents of the collection can be discovered. But it is, admittedly, in some ways very limiting.
  • So, what I’ve described to you very briefly is the traditional practice of archives. It is very much based on a world of paper and hierarchy and the traditional hierarchy of the creation of knowledge. In this world there is a lot of data that exists. Some of that data gets compiled and worked together into information. That information get refined and applied into knowledge, and finally wisdom is attained.
  • This is paralleled in the created of archival collections (or anything printed really).Stuff Happens (an accident happens in the workplace, a research study is carried out, a writer gets an idea)It’s Written down (a report is filled out, a research paper is written, the short story is drafted), but not everything that happens is written down, so there are fewer of themIt’s Saved (the report is filed, the paper is published and purchased by the library, the draft is put in the filing cabinet).It ends up in the Archives (the organization turns over records, the journal is collected by the rare book department, the author’s papers are donated). So here at the top of the pyramid, we have the smallest amount of stuff. The “important” stuff
  • This model is currently being disrupted. The first, and major source of disruption is the overwhelming production of paper-based information in the late 19th and 20th Centuries. Facsimiles became very easy to create with advances in printing technology, the creation of the typewriter and especially the word-processor/printer combo. Archives are drowning in the astronomical increase in paper records from the 20th Century.
  • This challenges this pyramid model by bloating the the “it’s written down” stage.
  • But that was just the beginning. In the second half of the 20th century something happened that caused an even bigger problem. Yes, that’s right. Bill Cosby broke the archives…
  • This transition to an immediately documentable record of nearly all communication and the ability to create perfect clones has the effect of further bloating the pyramid. Now there is even more stuff that is easy to record and save. In addition, the increased complexity of the environment in which those documents exist and are created makes it difficult to deal with them at the top of the pyramid
  • In short, there no longer is a pyramid. Instead, we’ve moved into a new paradigm: a network.Wisdom no longer works its way up to the top through a hierarchy. Documents don’t become important because they are saved. They are important if they are connected to other documents or if they can be found to answer the right question at the right time.This is having a staggering impact on the tradition of archival practice, as we’ll see later, but first, let’s talk aboutJust as a quick aside, if you are interested in this idea, of networked knowledge replacing traditional hierarchy, I’d recommend the book Too Big to Know, by David Weinberger, which just came out last year.
  • Sorry for the poor word choice…
  • We are now in an age of networked knowledge. With Digital Humanists in the room, I obviously don’t need to tell you all this, but just to recap, in the networked age:-texts can link to each other and create a cluster of relationships-these relationships can greatly influence the perceived status and trustworthiness of the information-there is almost no barrier to entry, democratizing “knowledge” and allowing anyone to publish themselves (all of this is with caveats, of course. I recognize that there is a deep digital divide in access to technology and the internet, but theoretically, it is more equal than the previous hierarchy of knowledge)The interesting thing is that these networks are replacing the previous pyramid. Instead of the smartest and best info coming out on top, network dynamics are replacing that. So any combination of question and person with the answer is the “best” one and those with the greatest access to knowledge are those with the most connectionsIn a sense, these documents are self-contextualizing. The nature of links between documents make them both documents and context to each other
  • Another feature of digital information that is interesting for archives is the fact that the actual contents of documents can searched instead of just the human or machine produced metadata that was all we had in the past.You can now search the contents of a text, of course, but facial recognition and other spatial technologies are exploring new methods of discovery with images and videoThere are limits to searching the text itself however, many times texts are more than what they say they are. Metadata is still extremely useful. Methods are increasing for automatic metadata assignment through semantic analysis, Human-created metadata is often still the gold standard, and the good part is that the network can also scale to this metadata creation through techniques like crowsourcing
  • Another hallmark of digital documents is that they can be perfectly duplicated again and again. With paper we typically see degradation over time as duplicates are made. In addition, the process of making duplicates can be labor-intensive. One of the more important activites of archivists is determining the authenticity and oringinality of texts. In the paper world, the “original” carries with it a lot of importance. But digital duplication can be achieved effortlessly, and perfectly. This screenshot shows a tool called the “Fast Duplicate File Finder.” I loaded into it a collection of digital papers of a local organization that is making a donor of organizational to the library. By comparing the checksums of files in that collection the tool found that there were 135 “groups” of duplicates comprising 793 files. Some of these duplications can be very meaningful. For example, in this group, the organization has a photo in their photo series, which is also re-used on their website and exists in the website support files. Keepingthe duplicate in both places is meaningful. In another case, the same file was saved twice with two different names with no meaningful difference between in the same folder. In that case, the duplication is much less meaningful.While the duplication is perfect
  • While the duplication of a digital document can be perfect, that can only take it so far. Digital objects are performative. they rely on a complex interplay of data, software, and operating system (at the very least). The “document” therefore has a long list of dependencies that also have to be met in order to interact with it.The context of the performance environment can be crucial to preservation. “Viewing” a document may be entirely contingent on knowing what software and operating system is needed.Simple word processing documents have fewer dependencies and can easily emulated, or at the very least their most essential characteristic – the text – can be rendered in many ways (not that there aren’t problems with that, but that’s a digression)Video games and other more complicated programs have greater dependencies and can be harder to emulate.
  • Documents are far more fluidly changingThe paper-based archive was based on “capturing” a document at the end of its lifecycle (final tax return, for example) as well as at previous point (manuscript drafts)In the medium of electronic communication, this happens much more quickly and complexlyLet’s use facebook as an exampleSince mediums like these are dynamic -- that is the “page” that you are viewing is a composite of real-time data pulled from various sources -- the same page can contain different data every time you look at it. For example, profile pictures: here is a snippet from my facebook page which was posted the day after my birthday in December 2010.I just grabbed the screenshot at a later point. This comment refers to my profile picture at the time, but what we see here is in fact my profile picture last week. The comment doesn’t necessary make a lot of sense in reference to that picture (here’s what they were). One could potentially figure this out from looking at the history of my profile pictures (something on a separate page), but it begs the question of what is the actual facebook “page” that was supposed to be saved? If the original is a born-digital document and I don’t have a method of versioning in place, the original is completely replaced by the altered version.
  • So finally, we come back around to the question, what are digital archives? Really, what I want to take a look at is, how are those aspects of digital documents that we just discussed affecting the traditional work of the archives.
  • One of the first is in the growing number of standards for archival information. EAD (Encoded Archival Description) is an XML markup language for encoding finding aids that was developed in the 90s. While it is not a metadata schema per se, EAD has become a de facto standard for the presentation of finding aids. An associated standard, EAC-CPF which stands for Encoded Archival Context, Corporate Bodies, Persons and Families, has just been released and will standardize the way that some of that archival context, namely authority records are created and shared.ISAD(g), ISAAR, ISDF, and ISDIAH are international standards set by the International Council on Archive and outline basic elements for archival description. It’s a more general standard than EAD. Finally, DACS and RAD are the American and Canadian input standards. An input standard describe how to actually fill in the fields of the metadata record…for example, EAD requires a title, and DACS signifies how that title should be formulated.There is not yet an underlying conceptual model for archival information, but that is changing. The ICA has just formed an expert group to develop such an model which will help unify and codify standards across the board. I am very fortunate to be the research assistant to that group and it will be exciting to see what they develop.
  • This kind of standardization means that archives can better share what they have. This is a screenshot of Archivegrid, a kind of worldcat for archival collections.
  • This reliance on standards for descriptive material has led to another really interesting project, wherein the context created around an archival collection is what has become networked knowledge, not just the materials itself.This is SNAC: Social networks in Archival Context. It’s jokingly referred to as “Facebook for dead people” and it is the product of IATH here at Uva.It take a body of EAC-CPF records and links them.So here I am discovering the entry for Robert Oppenheimer, and I can see that he is referred to in 14 different collections across several repositories. These 63 other people have some connection to him based on these records.One of the most interesting features of the project is to then explore Oppenheimer’s web of connections. Who was he friends with, and who were his friends of friends? This is a graph showing his relationships shared with FDR. You can follow this then to find records that share this context. With more granular data we could begin to track all kinds of contextual relationships.
  • Another area where archivists are making strides is in adapting technological tools to help them deal with the increasing scale of born-digital materials.Many of these tools allow us to identify and preserve technical metadata which preserve the context of materials. As an example, knowing that a file was creating using a word processing software called wordstar in 1987, and that it was created on a DOS operating system gives you information about what environment you might need to actually access the file.Other tools allow us to automate the collect materials from the web and analyze whether or not they should belong to a collection…rather than relying on the President’s office to send us a capture of their website monthly, we can automate collecting it ourselves.
  • This is a screenshot of a tool called FTK imager. It allows for the “forensic” analysis of digital material, meaning that it exposes certain technical characteristics and takes away some of the abstractions of the computer environment (i.e. we see the hash code here, rather than viewing the file within it’s software environment)This allows the archivist to gather information on those contextual environmental factors (date created, file system of the disk, software, etc.) that can aid in rendering later. Other technology can help establish authenticity and fixity over times. For example, when transfering files from one location to another an archivist can verify files haven’t corrupted by comparing checksums – unique alphanumeric codes generated from files algorithmically – before and after a transfer.
  • One of the biggest changes that all of this self-evident metadata and standardization are bringing about is a rethinking of the way in which archival materials are discovered.We are used to the Google and library paradigm of searching for discreet materials and evaluating them out of their context. Archivists are now thinking about ways to re-establish context in this paradigm.
  • This is an example of a project from Princeton that is re-imagining finding aids (and I should mention that we are working on something similar here at the moment).Because Princeton’s finding aids were all done using the same standard encoding, they could engineer a search engine that “drills-down” into finding aid components. So a search for “Albert Einstein” finds collections of Einstein’s papers. Series of his correspondence with others in other collections, down to, when the archivist determined that level of description was appropriate, individual items.When you look at the record for that individual item, you see it within the context of the rest of the collection. So all of that context that was described at a higher level (the collection, why it is here, how it was collected, the dates it covers, where this fits in that collection) is easily findable. This is in contrast to the prevailing “digital collection” model, where individual items are scanned and have their own metadata. If this item was digitized it could be displayed right there as well. It could have it’s own linked data ID, and then when it was shared it could again, be traced back to this context.
  • One the most “sci-fi” activities in digital archives these days are attempts to emulate or recreate environments in which digital objects were created. As we discussed earlier, traditional archival finding aids are representations of context, but these tools are actual recreations of it. These have been termed “enhanced curation” techniques by those at the British Library and include things like taking a high-resolution panoramic photo of the creator’s workspace, doing a video oral history with them while they use their equipment, and in a special case at Emory university…
  • Recreating the online environment of the laptop of Salman Rushdie. Emory received 4 computers from Rushdie. They created traditional tools like finding aids to describe what they found there, but also created a virtualization of Rushdie’s workspace. Users can browse and search using the emulated version of the Mac OS on Rushdie’s laptop. They can see his Mac stickies and view his drafts in his folders as he organized them.This was a really interesting project for a really high-profile collection. There were a lot of associated issues as well (you can’t write or save anything while in the virtualization, you can only view it on one particular workstation, Rushdie’s personal material was removed from the computer, etc.). In the end, this kind of emulation is not possible for many, if truly any other, collections at this time.However, if such recreation were to become easier, would this preservation of context be better than the traditional one of the archivist? I admit, in some ways yes.
  • So, as my final note today, I’d like to think about some of the implications of these changes.I believe that in the future Archives will do less appraising (i.e. deciding what to take in and preserve) and that there will be more searching to determine what things are important. In the past, if it was put in the archive, it was presumably important. Now it’s a matter of finding what’s important in the network of information for the particular purpose.I believe that the traditional authority of the Archives is waning, but along with it perhaps bias is as well. As I said earlier, theoretically the information age has democratized, or at least expanded, the creation of knowledge. Our traditional model of authorizing what is worth keeping has fallen apart, but that means that our biases about what will be important are going away as well.Traditional, archives-produced summary, as in the finding aid, will become less important as more self-contextualizing and environment emulating increases. The drawback is that summaries do help in aiding understanding and can occasionally recognize things with hindsight. However, generalization and lack of specificity will be less of a problem with this increase in granularity.Finally, I just want to point out that there are significant barriers to converting everything to this paradigm. There’s no reason to believe that a entirely paperless landscape is in our near future. Nor should it be. There is inherent loss in trying to convert analog to digital and I believe that humans will have needs for analog experiences for quite a while to come.

Do Digital Archivists Dream of Electronic Records Do Digital Archivists Dream of Electronic Records Presentation Transcript

  • What Are Archives?
  • Archives• Materials created or received by a person, family, or organization, public or private, in the conduct of their affairs and preserved because of the enduring value contained in the information they contain or as evidence of the functions and responsibilities of their creator, especially those materials maintained using the principles of provenance, original order, and collective control; permanent records.-Society of American Archivists
  • What do Archivists Do?• Respect des fonds• Original order• Collective controlTheimer, Kate. 2012. “Archives inContext and as Context.” Journalof Digital Humanities. 1:2.http://journalofdigitalhumanities.org/1-2/archives-in-context-and-as-context-by-kate-theimer/
  • Context• The document’s place in a larger information landscape• The objective or social environment in which the document existed• The mental or physical state and identity of the creator of the document-Lee, Cal. (2011). A Framework for contextual information in digital collections.Journal of Documentation. 67:1. 95-143.
  • WisdomKnowledgeInformation Data
  • It’s in the Archives It’s SavedIt’s Written Down Stuff Happens
  • It’s in the Archives It’s SavedIt’s Written Down Stuff Happens
  • It’s in the Archives It’s SavedIt’s Written Down Stuff Happens
  • Weingerger, David. (2012). Too Big to Know. New York: Basic Books.
  • What is “Digital”
  • Self-contextualizing
  • Searchable
  • Perfect Duplication
  • Dependencies
  • Transformative
  • What are Digital Archives?
  • Standardization• Archival Description Standards o EAD and EAC-CPF o ISAD(G), ISAAR, ISDF, ISDIAH o DACS and RAD• Conceptual Models
  • Capture and Analysis• Web archive crawl and analysis tools• Metadata extraction and creation• Algorithmic tools for fixity and identification
  • Redefining Discovery• How can archival materials be found and used without losing context?
  • Re-Creations• Emulating Environments• “Enhanced Curation”- John, Jeremy Leighton. “The future of saving our past.” Nature 459, 775-776 http://www.nature.com/nature/journal/v459/n7248/full/459775a.html
  • Implications• Less Appraisal, more searching• Authority is waning, but so is bias, perhaps• Summary will be less important, but over- generalization will not be as much of a drawback• Incomplete conversion to new paradigm
  • THANKS! Gretchen Gueguenhttp://gretchengueguen.com gmg2n@virginia.edu