• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Tthornton code4lib
 

Tthornton code4lib

on

  • 1,048 views

 

Statistics

Views

Total Views
1,048
Views on SlideShare
859
Embed Views
189

Actions

Likes
0
Downloads
5
Comments
0

3 Embeds 189

http://cynng.wordpress.com 167
http://eventifier.co 21
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Tthornton code4lib Tthornton code4lib Presentation Transcript

    • EAD without XSLTa practical approach to archival finding aids Trevor Thornton Senior Applications Developer, NYPL Labs The New York Public Library
    • Project goals• Enable multiple presentations of the same data• Support dynamic web applications• Cross-collection search with component-level specificity in results, and faceting on common access points
    • System overviewRuby on Rails+ MySQL+ SOLRKey functionality:Data ImportSearch indexAPI
    • Core models
    • Collection modelEach collection:•must have onedescription•may have one or morecomponents•may be associated withone or more access terms
    • Component modelEach component:•must belong to onecollection•must have one description•may have one parentcomponent•may have one or morechild components•may be associated withone or more access terms
    • Component hierarchy attributes• collection_id (id of root collection)• parent_id (id of parent component)• sib_seq (sibling sequence)• level_num (numeric level within hierarchy)• level_text (series, sub-series, file, etc.)• has_children Computed after initial data import; provided• max_levels as a convenience for finding aid UIs and to streamline formulation of API responses• top_component_id
    • Description modelElements of description organized(roughly) based on ISAD(G):•Descriptive identityISAD(G) 3.1•ContextISAD(G) 3.2.1 - 3.2.3•Acquisition & processingISAD(G) 3.2.4, 3.3.2-3.3.3•Content and structureISAD(G) 3.3.1, 3.3.4•Access and useISAD(G) 3.4•Related materialISAD(G) 3.5•NotesISAG(G) 3.6
    • Description model: basic EAD mapping
    • Description model: JSON format{ "unitid": [ { "value": "3283", "type": "local_mss" } ], "unittitle": [ { "value": "David Ames Wells papers" } ], "unitdate": [ { "type": "inclusive", "normal": "1847/1895", "value": "1847-1895" } ], "physdesc_extent":[ { "value": ".5 linear feet", "unit":"linear feet" }, { "value": "2 boxes", "unit":"containers" } ], "abstract": [ { "value": "David Ames Wells was an engineer, economist, textbook author, and advocate for lower tariff rates. This collection contains correspondence with Gordon L. Ford, Worthington C. Ford, and others; clippings; a manuscript draft of Protection: The Poor Mans Friend; and a lecture Wells delivered on free trade in 1882"} ], "prefercite": [ { "value": "<p>David Ames Wells papers, Manuscripts and Archives Division, The New York Public Library</p>" } ]}
    • EAD as a guide for data storage• EAD elements that allow only CDATA are stored as plain strings• EAD elements that require content to be structured in <p> or other block elements stored as HTML• Rules established for converting EAD to HTML when necessary• HTML conversion designed to support re-conversion back to EAD
    • Special handling for dates• Dates are hard o Inclusive dates and bulk dates o Multiple date formats o Ranges, lists and both• Special data structure for dates: o date_statement (original text) o inclusive_start / inclusive_end o bulk_start / bulk_end o keydate (for ordering query response – earliest inclusive date or earliest bulk date when present) o index_dates (for search faceting – every year included in range/list)
    • Access Term model
    • Refinement of Access Term/Access Term Association models
    • Data import• It’s messy business• Bulk of work has focused on EAD; Nokogiri used extensively for parsing XML• Basic process for EAD import: 1. Create collection record 2. Extract collection-level data, create/save description 3. Extract access terms, and for each a. Save if it doesn’t already exist b. Save collection/term association 4. Extract top-level components, and for each: a. Create component record b. Extract component-level data, create/save description c. Extract/save access terms & associations d. Extract child components and repeat for each
    • Integration with NYPL digital repository• Fedora repository + custom metadata creation/digitization workflow system + API to query repository data• All records in repository identified with UUID• UUID of digital object associated with a given component is stored locally in archives data system• Best case scenario: common identifiers appear in archival description and in Fedora
    • Apache Solr• Inter- and intra-collection search• Collocation via faceting and filter queries• Using RSolr to facilitate interaction with Solr (for both search and index)
    • API• API development is proceeding in step with finding aid development – available requests added as needed• Basic requests: o Collection-level data o Components of a collection, or sub-components of a component o Includes all component-level descriptive data o Max. depth can be specified o Digital assets associated with a component
    • Finding aid prototype
    • Finding aid prototype
    • Front-end system overview
    • Considerations for future development• Separate API from data management? o Data management app to handle all create/update/destroy operations, while API (Sinatra?) is read-only o Open API to public? Security/load considerations…• ArchivesSpace o NYPL is considering it as a possible replacement for our existing ‘home-grown’ system o How would this system integrate with ArchivesSpace API?• Upcoming EAD revision
    • some code to look at and/or borrow from:github.com/nypl/archives_data_publicfinding aid prototype:archives.nypl.orgme:trevorthornton@nypl.orgNYPL Labs:nypl.org/labs