Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Creating and Consuming Metadata from Transcribed Historical Vital Records for Ingestion in a Long-Term Digital Preservation Platform

292 views

Published on

Dolores Grant, Christophe Debruyne, Rebecca Grant, Sandra Collins:
Creating and Consuming Metadata from Transcribed Historical Vital Records for Ingestion in a Long-Term Digital Preservation Platform - (Short Paper). OTM Workshops 2015: 445-450

Published in: Science
  • Be the first to comment

  • Be the first to like this

Creating and Consuming Metadata from Transcribed Historical Vital Records for Ingestion in a Long-Term Digital Preservation Platform

  1. 1. IRL: Irish Record Linkage, 1864 - 1913 Crea;ng and Consuming Metadata from Transcribed Historical Vital Records for Inges;on in a Long-term Digital Preserva;on PlaIorm Dolores Grant (a) Christophe Debruyne (b), Rebecca Grant (a), and Sandra Collins (a) (a)  Digital Repository of Ireland, Royal Irish Academy, Dublin, Ireland (b)  ADAPT @ Trinity College Dublin, Dublin, Ireland October 27, 2015 @ META4eS
  2. 2. IRL: Irish Record Linkage, 1864 - 1913 Developing a plaLorm applying semanMc technologies to historical birth, death and marriage cerMficates. Answering quesMons such as: “How accurate are historic maternal mortality rates (MMR) and infant mortality rates (IMR) for Dublin?” Team consists of researchers (historians), digital archivists, and knowledge engineers. Knowledge and Linked Data Engineers Historians Digital Archivists
  3. 3. IRL: Irish Record Linkage, 1864 - 1913 General Registers Office (GRO) •  Vital registration data: birth- certificates, death-certificates and marriage records. •  Digitised TIFF images of hardcopy indexes and registers. •  2 TB of data •  Database describing the digitised records allowing searches on some fields. ©General Records Office of Ireland 2014
  4. 4. IRL: Irish Record Linkage, 1864 - 1913 In prior work (see [1]), we created a Linked Data plaLorm that allowed Digital Archivists to transcribe register pages, which were then transformed into RDF. That RDF was then used to populate other triplestores to analyze that data. Part of the project, however, was also to inves;gate the digital long-term preserva;on of the digi;zed register pages, and the corresponding RDF. CreaMon of IRL Knowledge Base RelaMonal Database GRO Triplestore TransformaMon Vital Records Ontology SeparaMon of Concerns Historical Events Ontology IRL Triplestore Data AnalyMcs Digital Archivist Historian LOD Cloud
  5. 5. IRL: Irish Record Linkage, 1864 - 1913 Related work •  Related work on the preservaMon of harvested metadata exist, e.g., in the context of GLAMS. •  Liale work was to be found in the context of historical (vital) records. It was limited to integraMon problems and addressing the problem record linking in databases. •  We also wanted to focus on research project agnosMc transcripMon of historical vital records (separaMon of concerns)
  6. 6. IRL: Irish Record Linkage, 1864 - 1913 Method: Crea;ng RDF Documents •  Register pages are idenMfied by a stamp number (e.g. “4646439”). We collect the triples around a page and related records with the following query to create an RDF document. •  PREFIX rec: <hap://purl.org/net/irish-record-linkage/records#> DESCRIBE * { ?page rec:stampNumber "4646439"; rec:withRecord ?record. } •  We also add a foaf:primaryTopic statement to the document.
  7. 7. IRL: Irish Record Linkage, 1864 - 1913 Method: Crea;ng Qualified Dublin Core Metadata •  AdopMng the guidelines formulated in [2], we adopted XSPARQL [3] to transform RDF documents in Qualified Dublin Core Metadata Documents. We thus have an RDF file and a QDC file for each register page.
  8. 8. IRL: Irish Record Linkage, 1864 - 1913 Register Page District/Union/County [SPATIAL COVERAGE] Superintendent registrar's district Date cerMfied as true copy by superintendent registrar [ISSUED] Date cerMfied by registrar [CREATED] Forename/surname registrar on page Forename/surname superintendent registrar [CREATOR] Page number/Volume/Quarter Stamp number [IDENTIFIER / used in TITLE] Year registered [TEMPORAL COVERAGE] Record Date of registraMon Title/forename/surname registrar Amendments Number in register CerMficate Forename/surname (of subject) [PART OF DESCRIPTION] Address (of subject) Sex (of subject) [PART OF DESCRIPTION] Forename/surname informant QualificaMon of informant RelaMonship of informant Residence of informant Death Record Forename/surname of registrar Date of death [PART OF DESCRIPTION] Cause of death and duraMon of illness CondiMon Age last birthday Place of residence Rank, profession or occupaMon 1 0..10
  9. 9. IRL: Irish Record Linkage, 1864 - 1913
  10. 10. IRL: Irish Record Linkage, 1864 - 1913 RelaMonal Database GRO Triplestore TransformaMon Vital Records Ontology Digital Archivist RDF File 1 RDF File 2 RDF File n Qualified Dublin Core XML 1 Qualified Dublin Core XML 2 Qualified Dublin Core XML n Regiser Page 1 Regiser Page 2 Regiser Page n transform … … … Digital long-term preservaMon plaLorm ingesMon Part of the IRL PlaLorm
  11. 11. IRL: Irish Record Linkage, 1864 - 1913 Method: Bulk Inges;on into a Digital Long Term Repository •  We adopted the Digital Repository of Ireland hap://repository.dri.ie/ •  Provides item by item ingesMon, or bulk inges;on via a command line tools. •  Files (digiMzed register pages, RDF and QDC) are named in a certain way to related QDC with the digiMzed asset and RDF transcripMon.
  12. 12. IRL: Irish Record Linkage, 1864 - 1913
  13. 13. IRL: Irish Record Linkage, 1864 - 1913 Conclusions and Future Work •  We created an automated process for creaMng and uploading assets, RDF transcripMons and associated metadata in a long term preservaMon plaLorm. •  EvaluaMon is limited due to the data sharing agreements; in terms of discoverability on the repository via faceted search and in terms of suitability of the metadata via expert feedback. •  Comparison of Qualified Dublin Core with Encoded Archival DescripMon (EAD) is to be conducted as well.
  14. 14. IRL: Irish Record Linkage, 1864 - 1913 References 1.  Christophe Debruyne, Oya Deniz Beyan, Rebecca Grant, Sandra Collins, Stefan Decker: On a Linked Data PlaLorm for Irish Historical Vital Records. TPDL 2015: 99-110 2.  BusMllo, M., Collins, S., Gallagher, D., Grant, R., Harrower, N., Kenny, S., Ní Cholla, R., O’Carroll, A., Redmond, S., Webb, S.: Qualified Dublin Core and the Digital Repository of Ireland (Grant, R. ed.). Tech. rep., Maynooth: Maynooth University; Dublin: Trinity College Dublin; Dublin: Royal Irish Academy; Galway: NaMonal University of Ireland, Galway (2015) 3.  Dell’Aglio, D., Polleres, A., Lopes, N., Bischof, S.: Querying the Web of Data with XSPARQL 1.1. In: Verborgh, R., Mannens, E. (eds.) Proceedings of the ISWC Developers Workshop 2014, co-located with the 13th InternaMonal SemanMc Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014. CEUR Work- shop Proceedings, vol. 1268, pp. 113–118. CEUR-WS.org (2014)
  15. 15. IRL: Irish Record Linkage, 1864 - 1913 QuesMons? More informaMon •  Twiaer: @IRL_Project •  Project website hap://irishrecordlinkage.wordpress.com/

×