Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
838
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. EAD without XSLTa practical approach to archival finding aids Trevor Thornton Senior Applications Developer, NYPL Labs The New York Public Library
  • 2. Project goals• Enable multiple presentations of the same data• Support dynamic web applications• Cross-collection search with component-level specificity in results, and faceting on common access points
  • 3. System overviewRuby on Rails+ MySQL+ SOLRKey functionality:Data ImportSearch indexAPI
  • 4. Core models
  • 5. Collection modelEach collection:•must have onedescription•may have one or morecomponents•may be associated withone or more access terms
  • 6. Component modelEach component:•must belong to onecollection•must have one description•may have one parentcomponent•may have one or morechild components•may be associated withone or more access terms
  • 7. Component hierarchy attributes• collection_id (id of root collection)• parent_id (id of parent component)• sib_seq (sibling sequence)• level_num (numeric level within hierarchy)• level_text (series, sub-series, file, etc.)• has_children Computed after initial data import; provided• max_levels as a convenience for finding aid UIs and to streamline formulation of API responses• top_component_id
  • 8. Description modelElements of description organized(roughly) based on ISAD(G):•Descriptive identityISAD(G) 3.1•ContextISAD(G) 3.2.1 - 3.2.3•Acquisition & processingISAD(G) 3.2.4, 3.3.2-3.3.3•Content and structureISAD(G) 3.3.1, 3.3.4•Access and useISAD(G) 3.4•Related materialISAD(G) 3.5•NotesISAG(G) 3.6
  • 9. Description model: basic EAD mapping
  • 10. Description model: JSON format{ "unitid": [ { "value": "3283", "type": "local_mss" } ], "unittitle": [ { "value": "David Ames Wells papers" } ], "unitdate": [ { "type": "inclusive", "normal": "1847/1895", "value": "1847-1895" } ], "physdesc_extent":[ { "value": ".5 linear feet", "unit":"linear feet" }, { "value": "2 boxes", "unit":"containers" } ], "abstract": [ { "value": "David Ames Wells was an engineer, economist, textbook author, and advocate for lower tariff rates. This collection contains correspondence with Gordon L. Ford, Worthington C. Ford, and others; clippings; a manuscript draft of Protection: The Poor Mans Friend; and a lecture Wells delivered on free trade in 1882"} ], "prefercite": [ { "value": "<p>David Ames Wells papers, Manuscripts and Archives Division, The New York Public Library</p>" } ]}
  • 11. EAD as a guide for data storage• EAD elements that allow only CDATA are stored as plain strings• EAD elements that require content to be structured in <p> or other block elements stored as HTML• Rules established for converting EAD to HTML when necessary• HTML conversion designed to support re-conversion back to EAD
  • 12. Special handling for dates• Dates are hard o Inclusive dates and bulk dates o Multiple date formats o Ranges, lists and both• Special data structure for dates: o date_statement (original text) o inclusive_start / inclusive_end o bulk_start / bulk_end o keydate (for ordering query response – earliest inclusive date or earliest bulk date when present) o index_dates (for search faceting – every year included in range/list)
  • 13. Access Term model
  • 14. Refinement of Access Term/Access Term Association models
  • 15. Data import• It’s messy business• Bulk of work has focused on EAD; Nokogiri used extensively for parsing XML• Basic process for EAD import: 1. Create collection record 2. Extract collection-level data, create/save description 3. Extract access terms, and for each a. Save if it doesn’t already exist b. Save collection/term association 4. Extract top-level components, and for each: a. Create component record b. Extract component-level data, create/save description c. Extract/save access terms & associations d. Extract child components and repeat for each
  • 16. Integration with NYPL digital repository• Fedora repository + custom metadata creation/digitization workflow system + API to query repository data• All records in repository identified with UUID• UUID of digital object associated with a given component is stored locally in archives data system• Best case scenario: common identifiers appear in archival description and in Fedora
  • 17. Apache Solr• Inter- and intra-collection search• Collocation via faceting and filter queries• Using RSolr to facilitate interaction with Solr (for both search and index)
  • 18. API• API development is proceeding in step with finding aid development – available requests added as needed• Basic requests: o Collection-level data o Components of a collection, or sub-components of a component o Includes all component-level descriptive data o Max. depth can be specified o Digital assets associated with a component
  • 19. Finding aid prototype
  • 20. Finding aid prototype
  • 21. Front-end system overview
  • 22. Considerations for future development• Separate API from data management? o Data management app to handle all create/update/destroy operations, while API (Sinatra?) is read-only o Open API to public? Security/load considerations…• ArchivesSpace o NYPL is considering it as a possible replacement for our existing ‘home-grown’ system o How would this system integrate with ArchivesSpace API?• Upcoming EAD revision
  • 23. some code to look at and/or borrow from:github.com/nypl/archives_data_publicfinding aid prototype:archives.nypl.orgme:trevorthornton@nypl.orgNYPL Labs:nypl.org/labs