Approaching Authority A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University by Ma...
<ul><ul><li>An introduction to EAC-CPF </li></ul></ul><ul><ul><li>Why use it </li></ul></ul><ul><ul><li>Our current workfl...
What is EAC-CPF? <ul><ul><li>E ncoded  A rchival  C ontext -  C orporate Bodies,  P ersons, and  F amilies </li></ul></ul>...
 
Why use EAC-CPF? <ul><ul><li>Context is essential to the understanding and use of archival materials </li></ul></ul><ul><u...
Metadata at Joyner Library <ul><ul><li>Old process: Finding aids contained in-house subject headings </li></ul></ul><ul><u...
 
 
 
 
 
 
 
Our steps for creating EAC <ul><ul><li>Created a master EAD XML file </li></ul></ul><ul><ul><li>Created an XSL stylesheet ...
 
 
Cluster and Edit <ul><ul><li>Fingerprint [key collision] </li></ul></ul><ul><ul><li>N-grams [key collision] </li></ul></ul...
 
 
Add columns <ul><ul><li>Add columns by fetching URLs </li></ul></ul><ul><ul><ul><li>Virtual International Authority File <...
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Questions? A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University by Mark Custer a...
Approaching Authority: A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University
Approaching Authority: A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University
Upcoming SlideShare
Loading in …5
×

Approaching Authority: A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University

2,084 views

Published on

This presentation discusses the background of the Encoded Archival Context standard (EAC-CPF) and its potential to enhance collaboration amongst archival institutions. The speakers focus on an early implementation of EAC-CPF at East Carolina University, but they also discuss other local efforts such as the groundbreaking NC-BHIO project.

Published in: Education
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,084
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • Differs from library authority records in that it links collections concerning the same creator/entity and provides context of the archival materials Separate from finding aids: Independent resources for researchers and repositories.
  • -McKim, in his article, uses the Terry Sanford papers as his justification for the creation of the NCEAD project. -Terry Sanford, as a state senator, governor of North Carolina, president of Duke, and a US Senator, has active collections at many institutions This makes it difficult for researchers. They have to search multiple places, as well as visit multiple repositories. -The argument for NCEAD was that it would be a virtual repository for finding aids – a centralized resource for receiving information about dispersed papers. -The same argument can be made for EAC. If implemented on a statewide (or larger) level, the location of collections would be more apparent. It would be the ultimate resource for researchers and other repositories.
  • New process: Review finding aid. Decide authorized form for creator name. Check the LCNAF, our library catalog. If name does not exist, we create an authorized for according to AACR2. Cataloging staff has had NACO training, but we do not currently submit all of these names for approval. Assign new LC subject headings. Enter creator information and subject headings in database using web form. Regenerate html and index (updates EAD) Upgrade the marc record and submit to OCLC Overlay upgraded record in Symphony Update EAD file on server Time consuming process that leaves many un-cataloged collections: -Typically do not update un-cataloged collections with authorized form of name until the finding aid is cataloged. - Cataloger research to confirm it is indeed the same creator. -Research requires communication with Special Collections staff members and can be time-consuming. For EAC to be fully implemented at ECU, we would need to examine the process and make adjustments so that all collections could receive some cataloging treatment (at least authorized names) before an EAC record is created.
  • The Social Networks and Archival Context Project Processing EAD records from LoC, OAC, and the NWDA Despite having nearly 124k de-duplicated names present in the current database, if you were to do a search for “Frances Renfrow Doak” here, you won’t find any records. And so, this was one of the primary impetuses behind our desire to work on this preliminary project.
  • We imported a lot of data into Google Refine, but for the purposes of this presentation, we will focus on our “normal name” column. Immediately you can see that we have some names listed here, like “Alex Albright”, which are in need of being grouped into a single record. In this particular case, that is very easy to do since those strings are exact matches. One of the ways that Google Refine really shines, though, is in its ability to discover a variety of inexact matches with its “cluster and edit” feature.
  • 5 well-established and advanced string comparison techniques. (fingerprint, metaphone, and ppm proved most useful for our data).
  • The fingerprint algorithm is almost always going to produce meaningful results, especially if you don’t have many names with special character encoding issues. Most importantly, it will highlight things that you won’t discover if you’re just attempting to match exact strings, or even exact strings after the values of have been translated to lower-case.
  • The default settings in Refine turned out to be pretty good. I did attempt to use the PPM method with its most precise options and this took an extremely long time (and didn’t yield any useful results that we hadn’t already uncovered). Since we had 1842 strings to analayze, for instance, reducing “block chars” down to 1 meant that our computer had to execute nearly 1.7 million different computations (and it just didn’t seem to matter for strings this short). However, in the case of our data, increasing the radius to 1.8 uncovered every permutation of inaccuracies in our names that we needed to ferret out (so, I would suggest experimenting with that variable if using the PPM method on a similar dataset). The Levenshtein function, however, just doesn’t make much sense when doing name comparisons (especially when some of those authorized names [i.e., a lot of extra characters] will have years associated with them, and others won’t)
  • Another way that Google Refine will helps in a project such as this, is in its ability to add to your data w/o having to do any extra scripting outside of the software package.
  • Querying the VIAF database with “names”, since we haven’t yet recorded any LCCNs.
  • Adding those unique IDs into our EAC records. This is our record for Frances Renfrow Doak.
  • This is the HTML view of the Frances Renfrow Doak EAC record. If we implement EAC into our current EAD database at ECU, we would write our own stylesheet for display (especially to take advantage of some of the extra data that we’ve added to our initial records). For the purposes of this trial, however, I have used a slightly modified XSLT stylesheet that Brian Tingle has created for the SNAC project (and graciously shared online at https://bitbucket.org/btingle/cpf2html).
  • The “related identities” listed here, for instance, could be added as extra “cpfRelations” to this particular EAC record.
  • Right now, however, the only cpfRelations that have been added to this record, were those names that you previously saw listed under “Autograph entries” in the online view of this EAD collection. And so, that’s where those 45 people listed here have come from.
  • And if you scroll down, you’ll see here that “Terry Sanford” also has an external link…
  • … which takes you to a very bare-bones record already created in the process of the SNAC project (due to an EAD record for a collection housed at Stanford). However, there is another EAC record for Terry Sanford – a much more robust record – which was created over five years ago in 2006.
  • Approaching Authority: A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University

    1. 1. Approaching Authority A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University by Mark Custer and Jennifer Joyner 2011-03-31 Society of North Carolina Archivists South Carolina Archival Association 2011 Joint Meeting
    2. 2. <ul><ul><li>An introduction to EAC-CPF </li></ul></ul><ul><ul><li>Why use it </li></ul></ul><ul><ul><li>Our current workflows </li></ul></ul><ul><ul><li>Creating EAC records at ECU </li></ul></ul><ul><ul><li>Conclusion </li></ul></ul>Outline of our talk
    3. 3. What is EAC-CPF? <ul><ul><li>E ncoded A rchival C ontext - C orporate Bodies, P ersons, and F amilies </li></ul></ul><ul><ul><li>XML-based encoding standard for creators of archival records/collections </li></ul></ul><ul><ul><li>Used in conjunction with EAD </li></ul></ul><ul><ul><li>Authority record for creators </li></ul></ul><ul><ul><li>Links collections concerning the same creator and provides context of the archival materials </li></ul></ul><ul><ul><li>Separate from finding aids </li></ul></ul>
    4. 5. Why use EAC-CPF? <ul><ul><li>Context is essential to the understanding and use of archival materials </li></ul></ul><ul><ul><li>Establishes preferred name headings for individuals, families, and corporate bodies </li></ul></ul><ul><ul><li>Helps identify dispersed materials with the same creator </li></ul></ul><ul><ul><li>Saves time and money </li></ul></ul><ul><ul><li>Contributes to the establishment of creator descriptions that trace the complex relationships between creators, their activities, and their records </li></ul></ul>
    5. 6. Metadata at Joyner Library <ul><ul><li>Old process: Finding aids contained in-house subject headings </li></ul></ul><ul><ul><li>New process: Finding aids assigned new Library of Congress Subject Headings </li></ul></ul><ul><ul><li>Authority work with creator names </li></ul></ul><ul><ul><ul><li>Library of Congress Name Authority File </li></ul></ul></ul><ul><ul><ul><li>Library catalog </li></ul></ul></ul><ul><ul><ul><li>Name created according to AACR2 </li></ul></ul></ul><ul><ul><li>Currently: </li></ul></ul><ul><ul><ul><li>493 have been upgraded (out of 1864 ) </li></ul></ul></ul><ul><ul><li>Workflow issues </li></ul></ul><ul><ul><ul><li>Un-cataloged collections </li></ul></ul></ul><ul><ul><ul><li>All collections would need to receive some cataloging treatment before creating an EAC record for the creator </li></ul></ul></ul>
    6. 14. Our steps for creating EAC <ul><ul><li>Created a master EAD XML file </li></ul></ul><ul><ul><li>Created an XSL stylesheet that would examine all of our EAD files </li></ul></ul><ul><ul><li>Imported this single file into Google Refine </li></ul></ul><ul><ul><li>Used this data to create separate EAC files </li></ul></ul><ul><ul><li>Translated these EAC files for web display </li></ul></ul>
    7. 17. Cluster and Edit <ul><ul><li>Fingerprint [key collision] </li></ul></ul><ul><ul><li>N-grams [key collision] </li></ul></ul><ul><ul><li>Double Metaphone [key collision] </li></ul></ul><ul><ul><li>Levenshtein [nearest neighbor] </li></ul></ul><ul><ul><li>PPM [nearest neighbor] </li></ul></ul>
    8. 20. Add columns <ul><ul><li>Add columns by fetching URLs </li></ul></ul><ul><ul><ul><li>Virtual International Authority File </li></ul></ul></ul><ul><ul><ul><li>WorldCat Identities </li></ul></ul></ul>
    9. 39. Questions? A Preliminary Implementation of Encoded Archival Context (EAC-CPF) at East Carolina University by Mark Custer and Jennifer Joyner 2011-03-31 Society of North Carolina Archivists South Carolina Archival Association 2011 Joint Meeting

    ×