Images from: http://www.ica.org and http://www.archivists.org
Top image (granules) from http://www.thehealthyhousecat.com
EAD Revision, EAC-CPF introduction
1. EAD Revision2. EAC-CPF: anintroductionTimothy Ryan MendenhallLeo Baeck Institute2012 March 28
EAD Revision Timetable: Currently: analyzing comments submitted during open comment period December 2012: draft schema for revision and comment August 2013: release of new schema
EAD Revision What to expect: Migration plan Interoperability: • better support for the semantics of relationships (cf. EAC-CPF, RDA) Interchange: • data interchange trumps presentation • promote uniform and predictable use to enable better interchange of data.
EAD Revision: the details. . . Schema only -- DTD will be deprecated Simplification: Reduced number of tags Deprecate presentation-oriented tags like <emph>, <head>, <table> DTD Schema
EAD Revision: the details. . . Simplification: Simplified header Simplified hierarchical structure • <c01>, <c02> etc merge into undifferentiated <c> tags • Wrapper and structural tags like <dsc> might be deprecated
EAD Revision: the details. . . Make EAD more database-friendly: Less mixed content, more tagged data More specific, granular tags: e.g. forenames and surnames More flexibility for normalizing dates (multiple dates, ranges of dates, etc. Cf. EAC, RDA) Geo-tagging “Profiles” of tag sets for different types of repositories
EAD Revision: the details. . . Extend potential for language qualifications: <geogname language=”ger”>Köln (Deutschland)</geogname> and/or <geogname language=”eng”>Cologne (Germany)</geogname>
Date-centered model: Goals Improve machine-readability of finding aids Aid in the sharing of finding aid data across platforms, CMS’s, languages, countries, and different aggregators Move away from the document model: finding aid as a fluid, malleable record, not a fixed document
Affect on CJH Likely minimal – migration paths will be made available Conversion from EAD-DTD to EAD- Schema Creation of task force? Resources, stylesheets available Creation of new EAD templates New possibilities!
Basics EAC-CPF: Encoded Archival Context – Corporate Bodies, Persons, Families XML vocabulary Based on ISAAR-CPF: int’l standard related to ISAD(G) Adopted by SAA in 2011: standard for archival authority data
Features Parallels many RDA changes Increased granularity of data • E.g. life dates split into birth and death dates Emphasis on relations • With other resources • With other corporate bodies, persons, families • With functions
Features Compatibility with existing authority data (LCSH, etc.) Wrapper elements allow wholesale inclusion of outside metadata, i.e. authority MARC-XML Great flexibility for alternate names, variant forms, local implementations
Features Accomodates 4 different types of “entities” Single identity Multiple identity • Many in one (single EAC-CPF instance) • One in many (multiple instances) Alternative sets (i.e. variant records)
Why EAC? Part of broader move towards semantic web, linked open data (LOD) Better end-user experience Improves capacity for faceted searching More intuitive web interfaces Standardization of authority data Sharing of authority data Eventually – saves time
Basic structure Like EAD (and MARC), divided into control and descriptive sections: <eac-cpf> <control> […] </control> <cpfDescription> [ALTERNATE <multipleIdenties><cpfDescription> . . .] </cpfDescription> </eac-cpf>
Basic structure : Control Administrative data about the record itself Required elements: • recordId • maintenanceAgency • maintenanceStatus • maintenanceHistory • languageDeclaration • sources
Basic structure : Control Optional elements Allow for local customization Use of other identifiers for same entity (i.e. from other thesauri, other national libraries, etc.)
Basic structure : cpfDescription Descriptive section <cpfDescription> For most records: single <identity> For complex identities • many-in-one, corporate and compound entities • multiple <cpfDescription> elements wrapped in <multipleIdentities> tag
Basic structure : cpfDescription Required: <identity> Optional: <description> <relations> <alternateSet> -- alternate records for the same entity imported from a different authority system, such as LCSH, VIAF, or a different national library.
Basic structure : cpfDescription Descriptive section: required <identity> Most complex element Parallels RDA changes: • Increased functionality for parallel and variant forms of names • Can distinguish between “authorized” and “preferred” forms of a name • Increased granularity (parts of names, dates) • Ability to qualify variant forms of names by “use dates”
Basic structure : cpfDescription Optional <description> Very similar to RDA, but encoded in XML <existDates> • <date>, <dateRange>, <dateSet> <places> • May be qualified by dates and roles • Place of residence, place of birth, place of death, etc.
Basic structure : cpfDescription Optional <description> All may be qualified by dates: <occupations> <functions> <legalStatus> (corporate body) <mandates> (corporate body)
Basic structure : cpfDescription Optional <description> “Free text” descriptive sections: <biogHist> -- same as in EAD <generalContext> -- “general social and cultural context <structureOrGenealogy> • Structure of corporate bodies • Genealogy of individuals, families
Basic structure : cpfDescription Relations section: <cpfRelation> -- relations to other “entities” <functionRelation> <resourceRelation> <objectXMLWrap> to include other records, portions of other records
Basic structure : cpfDescription Relations section: All have “relation type” attributes to help specify the type of relation: • cpf: Family, associative, hierarchical-child, hierarchical-parent, etc. • Functions: controls, owns, performs, etc. • Resources: creator, subject, etc. To include other records, portions of other records: • <objectXMLWrap> • <objectBinWrap>
Implementation at CJH? Via Digitool? Similar to MARC to EAD • Wholesale batch conversion • Issues: • data cleanup • Skeletal data • Resolving differences in existing biographical notes, etc. • Digitool’s interface – not good for “active” records needing frequent syncing, updating
Implementation at CJH? Via Digitool? Steps required: • Batch export of authority data from Aleph • LCNAF is also available for download • MARC to EAC stylesheet • Google Refine: cleanup data • Ingest to Digitool • EAC to HTML stylesheet • Google Refine: resolution with existing EADS
Implementation at CJH? Via Digitool? Potential for *labor-intensive* edits • Roles within collection • Center-wide agreement on relator terms (RDA?), manually updating EAD “role” attributes • Expansion of biogHist, structureOrGenealogy, etc. Custom database outside of Digitool? Eventually – ArchivesSpace?
Future potential Crowd-sourcing Relationship data Function data (“Is correspondent”, “Is subject” etc) Genealogical data Harvesting biographical, historical and genealogical data DBPedia JewishGen
Resources MARC-XML to EAC stylesheets Entire LCSH, LCNAF available for download (MADS/RDF): http://id.loc.gov/download/ EAC-Pages: http://eac.staatsbibliothek-berlin.de/ EAC listserv = EAD listserv