Your SlideShare is downloading. ×
0
5 December 2013
OCLC TAI-CHI webinar series
#oclcr

Achieving Thresholds for Discovery

Addressing Issues with EAD to Incr...
Achieving Thresholds for Discovery
Issues with EAD

Merrilee Proffitt
Senior Program Officer, OCLC Research

5 December 20...
http://journal.code4lib.org/articles/8956
EAD analysis
• Based on an April 2013 harvest of EAD
encoded finding aids for ArchiveGrid
• Analysis of elements that woul...
EAD analysis
• Focus on support for discovery not standards or
best practices (although not mutually
exclusive).

5
A Review of Discovery Options

6
Methodology
• Recreated analysis*
done by Wisser and
Dean – Xpath queries
across the data set
• Considered which
elements ...
Methodology
The distribution of element usage was roughly
divided into 4 groups:
•
•
•
•

Low -- between 0% - 50%
Medium -...
Findings
• Lots of “medium,” few “high” or “complete”
• Even when an element is accounted for, the
content may make it dif...
Is hope on the horizon?
• Finding aids in ArchiveGrid may represent
legacy encoding
• New focus on shared authoring tools ...
Over to Dan..

11
Finding Aids and Thresholds
for Discovery at Princeton
OCLC Research Webinar

Dan Santamaria
Seeley G. Mudd Manuscript Lib...
Discovery: Profession-Wide
Challenges
• The reluctance to embrace archival standards
• EAD and document-centric descriptio...
Challenges: Backlogs

– AN INTERNET ACCESSIBLE
FINDING AID EXISTS FOR
44% OF ARCHIVAL
COLLECTIONS
» OCLC “Taking Our Pulse...
Discovery:
Institution-Specific Challenges
• Backlogs
– Princeton University Archives had no finding aids
as late as 1990....
Thresholds for Discovery: Phase 1
• Efficient backlog reduction
• DACS compliance

• Collection-level and series-level foc...
Phase 1: Our Approach
Punting on idiosyncratic legacy description
TMs, pp. numbered 1-62, (pp. numbered
1-23 are photocopi...
Phase 1: Our Approach
• Stated goals
– Provide minimum level of online access to
collections (collection-level records).
–...
Phase 1: Our Approach
• Survey entire holdings and record
holdings/location information and very basic
descriptive data
• ...
Collection-Level EAD
Phase 1: Results
• All collections encoded in EAD and MARC by
end of 2007
• DACS single-level and multi-level optimum

• P...
Thresholds for Discovery: Phase 2
Phase 2: Requirements and Goals
Principles
• User focus
–
–
–
–

Find
Identify
Select
Obtain

• Data not documents
Data Analysis
Search/Browse/Sort/Display/Limit
Search/Browse/Sort/Display/Limit
Search/Browse/Sort/Display/Limit
Beyond Collection-Level
Sort by title

Sort by date
Data Enhancement
• Specific Elements
– Dates
– Extent
– Titles
– Creators
– “Access Points”
– Digital Content

• ALL EADs
...
Dates
Collection-Level
• Virtually all present
• Virtually all normalized
• Little work required

Component-Level
• WORK R...
Extent
Collection-level
• Virtually all present
• Little structure
• Effective for display
• Ineffective for sorting;
repo...
Coming Soon: <physdescstructured>
• Attributes:
– @coverage = whole or part
– @physdescstructuredtype = carrier, materialt...
Access Points: Subjects and “Topics”
EAD
<subject
rules="local"
source="local"
encodinganalog="690"
authfilenumber="t9">
A...
Indexing
Component Identifiers
<c id="C0041_c0070" level="series">
<did>
<unittitle>
Series 3: Correspondence
</unittitle>
<unitdat...
Data Management
• RelaxNG schema
– Loose
– Strict

• Normalization tool
Lessons Learned

Iterative Description Works
Lessons Learned: Content Standards
Lessons Learned
Usability
Lessons Learned:
Discovery Happens Elsewhere
Traffic Sources
2%

1%

1%

4%
8%
google / organic
(direct) / (none)
princeto...
Lessons Learned
Think beyond EAD: Monitor developments with
conceptual models and linked data.

http://www.ica.org/13799/t...
Where to Start
1. DACS
2. Structure
3. Iterate

Tools that support all three
Credits
Archival Description Working Group
(2011-2013)
• Maureen
Callahan
• John Delaney
• Shaun Ellis
• Regine Heberlein
...
findingaids.princeton.edu
Questions: dsantam@princeton.edu
Thank You!
Merrilee Proffitt
proffitm@oclc.org
Dan Santamaria
dsantam@princeton.edu

©2013 OCLC. This work is licensed und...
Upcoming SlideShare
Loading in...5
×

Achieving Thresholds for Discovery

422

Published on

http://www.oclc.org/research/presentations.html

Presented as an OCLC TAI-CHI Webinar by Merrilee Proffitt and Dan Santamaria, 5 December 2013.

http://www.oclc.org/research/events/2013/12-05.html

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
422
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Achieving Thresholds for Discovery"

  1. 1. 5 December 2013 OCLC TAI-CHI webinar series #oclcr Achieving Thresholds for Discovery Addressing Issues with EAD to Increase Discovery and Access Merrilee Proffitt Senior Program Officer OCLC Research Dan Santamaria Assistant University Archivist for Technical Services Seeley G. Mudd Manuscript Library Princeton University
  2. 2. Achieving Thresholds for Discovery Issues with EAD Merrilee Proffitt Senior Program Officer, OCLC Research 5 December 2013 OCLC TAI-CHI webinar series #oclcr
  3. 3. http://journal.code4lib.org/articles/8956
  4. 4. EAD analysis • Based on an April 2013 harvest of EAD encoded finding aids for ArchiveGrid • Analysis of elements that would support five dimensions of a discovery system: 1. 2. 3. 4. 5. Search Browse Display Sort Limit 4
  5. 5. EAD analysis • Focus on support for discovery not standards or best practices (although not mutually exclusive). 5
  6. 6. A Review of Discovery Options 6
  7. 7. Methodology • Recreated analysis* done by Wisser and Dean – Xpath queries across the data set • Considered which elements would (or could) be used to “power” various aspects of discovery • *not all tables reproduced 7
  8. 8. Methodology The distribution of element usage was roughly divided into 4 groups: • • • • Low -- between 0% - 50% Medium -- between 51% - 80% High -- between 81% - 95% Complete -- between 96% - 100% 8
  9. 9. Findings • Lots of “medium,” few “high” or “complete” • Even when an element is accounted for, the content may make it difficult to use (unitdate and extent are two examples) • Most “complete” elements are administrative in nature, or are required by the DTD/schema • In short, EAD encoding may not (now) give a lot of bang for the discovery buck. 9
  10. 10. Is hope on the horizon? • Finding aids in ArchiveGrid may represent legacy encoding • New focus on shared authoring tools may help • EAD3 may help • Tools and techniques for improving finding aids (with an emphasis on discovery) may help 10
  11. 11. Over to Dan.. 11
  12. 12. Finding Aids and Thresholds for Discovery at Princeton OCLC Research Webinar Dan Santamaria Seeley G. Mudd Manuscript Library
  13. 13. Discovery: Profession-Wide Challenges • The reluctance to embrace archival standards • EAD and document-centric description • Most of all, the persistence of backlogs
  14. 14. Challenges: Backlogs – AN INTERNET ACCESSIBLE FINDING AID EXISTS FOR 44% OF ARCHIVAL COLLECTIONS » OCLC “Taking Our Pulse Survey”
  15. 15. Discovery: Institution-Specific Challenges • Backlogs – Princeton University Archives had no finding aids as late as 1990. – 2005: 2/3 of University Archives lacked descriptive records of any kind. • Little structured data for “Finding Aids” from any division. • Most arrangement and description work done by staff on short-term and soft money positions.
  16. 16. Thresholds for Discovery: Phase 1 • Efficient backlog reduction • DACS compliance • Collection-level and series-level focus • Make sure all of represented online our collections were
  17. 17. Phase 1: Our Approach Punting on idiosyncratic legacy description TMs, pp. numbered 1-62, (pp. numbered 1-23 are photocopies of the original), ANs and holograph corrections 215 pages (pages 19 and 20 are missing). Dates and locations, 1975 March 26-1976 June 29; Princeton, N.J. (1-26, 31-34) Madison, Wis. (26-30) . Hanover, N.H. (34-38) . Sitges, Spain (39-215). Notebook on Casa de campo. Preoccupation with plot details, characterization, chapter transitions. After a long period away from home and from the novel (1-52), the author resumes work on it by re-evaluating each chapter. By the end of the notebook he has completed a second draft of the novel's first part (chs. 1-7) and the first chapter of the second part. The notebook contains a variety of personal comments about the author and those around him.
  18. 18. Phase 1: Our Approach • Stated goals – Provide minimum level of online access to collections (collection-level records). – Gain acceptable level of intellectual control over collections. – Provide a centralized entry point for researchers and staff.
  19. 19. Phase 1: Our Approach • Survey entire holdings and record holdings/location information and very basic descriptive data • Create collection-level records for all collections – MARC – DACS single-level optimum
  20. 20. Collection-Level EAD
  21. 21. Phase 1: Results • All collections encoded in EAD and MARC by end of 2007 • DACS single-level and multi-level optimum • Processing and retro-conversion happening concurrently – More than 800 finding aids encoded, 2006-2007 – More than 2500 linear feet processed/described in 2006-2007
  22. 22. Thresholds for Discovery: Phase 2
  23. 23. Phase 2: Requirements and Goals
  24. 24. Principles • User focus – – – – Find Identify Select Obtain • Data not documents
  25. 25. Data Analysis
  26. 26. Search/Browse/Sort/Display/Limit
  27. 27. Search/Browse/Sort/Display/Limit
  28. 28. Search/Browse/Sort/Display/Limit
  29. 29. Beyond Collection-Level Sort by title Sort by date
  30. 30. Data Enhancement • Specific Elements – Dates – Extent – Titles – Creators – “Access Points” – Digital Content • ALL EADs – Minimize mixed content – Unnumber <c0X> – Denested <unititle> and <unidate> – Remove <head> and @label
  31. 31. Dates Collection-Level • Virtually all present • Virtually all normalized • Little work required Component-Level • WORK REQUIRED! • 2 months
  32. 32. Extent Collection-level • Virtually all present • Little structure • Effective for display • Ineffective for sorting; reporting; analysis Component-level • Consistently present at series/subseries level • Infrequently present at lower component levels • Little structure
  33. 33. Coming Soon: <physdescstructured> • Attributes: – @coverage = whole or part – @physdescstructuredtype = carrier, materialtype, or spaceoccupied • Required Elements – <quantity> – <unittype>
  34. 34. Access Points: Subjects and “Topics” EAD <subject rules="local" source="local" encodinganalog="690" authfilenumber="t9"> American literature </subject> SKOS
  35. 35. Indexing
  36. 36. Component Identifiers <c id="C0041_c0070" level="series"> <did> <unittitle> Series 3: Correspondence </unittitle> <unitdate normal="1951-08-21/1978-12-31" type="inclusive"> 1951 August 21-1978 </unitdate> <physdesc> <extent type="computed">1 folder</extent> </physdesc> </did>
  37. 37. Data Management • RelaxNG schema – Loose – Strict • Normalization tool
  38. 38. Lessons Learned Iterative Description Works
  39. 39. Lessons Learned: Content Standards
  40. 40. Lessons Learned Usability
  41. 41. Lessons Learned: Discovery Happens Elsewhere Traffic Sources 2% 1% 1% 4% 8% google / organic (direct) / (none) princeton.edu / referral 10% en.wikipedia.org / referral library.princeton.edu / referral 55% bing / organic catalog.princeton.edu / referral 19% yahoo / organic
  42. 42. Lessons Learned Think beyond EAD: Monitor developments with conceptual models and linked data. http://www.ica.org/13799/the-experts-group-on-archival-description/
  43. 43. Where to Start 1. DACS 2. Structure 3. Iterate Tools that support all three
  44. 44. Credits Archival Description Working Group (2011-2013) • Maureen Callahan • John Delaney • Shaun Ellis • Regine Heberlein • Dan Santamaria • Jon Stroop • Don Thornbury
  45. 45. findingaids.princeton.edu Questions: dsantam@princeton.edu
  46. 46. Thank You! Merrilee Proffitt proffitm@oclc.org Dan Santamaria dsantam@princeton.edu ©2013 OCLC. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Suggested attribution: “This work uses content from “Achieving Thresholds for Discovery” © OCLC & Dan Santamaria, used under a Creative Commons Attribution license: http://creativecommons.org/licenses/by/3.0/”
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×