Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

567 views

Published on

Presentation given Rebecca Grant, DRI Digital Archivist, and Dolores Grant, IRL-DRI Digital Archivist, at the Irish Record Linkage workshop held at the University of Limerick, 10th February 2016. It gives an overview of the Irish Research Council funded Irish Record Linkage project, focusing on how digital data archiving was undertaken by the partners at the Digital Repository of Ireland.

Published in: Data & Analytics
  • Be the first to comment

Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project

  1. 1. Data archiving for the Irish Record Linkage project Rebecca Grant, Digital Archivist, Digital Repository of Ireland Dolores Grant, IRL-DRI Digital Archivist, Digital Repository of Ireland
  2. 2. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Irish Record Linkage project 1864-1913 Irish Record Linkage is an Irish Research Council funded project running from 2014 – June 2016 Collaboration between the University of Limerick (historians), the Digital Repository of Ireland at the Royal Irish Academy (archivists), and Insight@NUI Galway (knowledge engineers, Linked Data experts) Constructing a Knowledge Platform – Linked Data based on Vital Registration Data (digitised registers of Births, Marriages and Deaths) in order to answer research questions around infant and maternal mortality
  3. 3. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Irish Record Linkage project 1864-1913 The Linked Data concept and the project’s dataset Extracting data from the vital records Approaches to archival authenticity Preservation of the records
  4. 4. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 The Digital Repository of Ireland DRI is a trusted digital repository for the Humanities and Social Sciences data – launched June 2015 and based at the Royal Irish Academy Linking and preserving the rich collections held by Irish institutions (archives, museums, libraries, galleries, universities, research projects etc) Focal point for the development of national guidelines and policy for digital preservation and access. repository.dri.ie
  5. 5. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 INSIGHT@NUI Galway Insight is a joint initiative between University College Dublin, the National University of Ireland at Galway, University College Cork, and Dublin City University. Insight was established in 2013 by Science Foundation Ireland with funding of €75m. The Semantic Web, Sensors and the Sensor Web, Social network analysis, Decision Support and Optimization, and Connected Health.
  6. 6. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Irish Record Linkage and Linked Data Queries • How many women died within 42 days following childbirth due to complications related to labour and how does that figure correspond with the official reports? • Which women died of causes that can be attributed to maternal death, but for which no corresponding birth certificate exists? • How did various socio-economic conditions affect maternal and infant mortality rates?
  7. 7. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 The General Register Office (GRO) – civil registry responsible for recording information on births, deaths and marriages. Records of 5,847,323 births (from 1864 to 1912), 4,236,922 deaths (from 1864 and 1912) and 1,160,546 marriages (from 1845 to 1912) transferred to the project team with strict terms and conditions. Events were captured on register pages (up to 10 for births and deaths, and up to 4 for marriages) divided by district and sent to the GRO where volumes were then created and an index compiled. Database dump of the GRO's database with digitised versions of the register pages and indexes (TIFFs) General Register Office records
  8. 8. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 The Linked Data Concept The example above describes the subject (James Joyce) and his relationship (predicate) to an object (Dublin). By semantically separating the elements of the information (that James Joyce was born in Dublin) datasets stored in this way can be easily queried.
  9. 9. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03
  10. 10. Birth Records Register TIFF Index TIFF System Pre 1900 System Post 1900 Superintendent Registrar’s District Registrar’s District Registration district District District Union County County County Province Province Number in register Entry number Date & place of birth Year of event Date of birth, year of event Name (if any) Name Forename, Surname Forename, Surname Sex Sex Name, surname & dwelling place of father Name & surname & maiden surname of mother Mother’s maiden name Rank or profession of father Signature, qualification, and residence of informant When Registered Returns year Returns year Returns quarter Returns quarter Signature of Registrar Name & surname & maiden surname of mother Rank or profession of father Signature, qualification,
  11. 11. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Archival principles The principle of provenance: Provenance means the history of ownership related to a group of records or an individual item in a collection. Preserving information on these relationships is essential as they provide evidence of how and who created and used the records before they became part of the archives. Provenance provides essential contextual information for understanding the content and history of an archival collection The principle of original order: Archives are kept in the order in which they were originally created or used. This original order allows custodians to protect the authenticity of the records and provides essential information as to how they were created, kept and used.
  12. 12. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Data (eg. database records and TIFFs) are only stored for the duration of the project, and must be destroyed following its completion Data can only be accessed by the IRL project team after an access agreement has been signed Records cannot be duplicated, downloaded, brought off-site Personal, identifying information cannot be published Copyright and related rights remain vested in the General Register Office. Terms of transfer
  13. 13. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation Archival authenticity The quality of being genuine, not a counterfeit, and free from tampering, and is typically inferred from internal and external evidence, including its physical characteristics, structure, content, and context. The presence of a signature serves as a fundamental test for authenticity; the signature identifies the creator and establishes the relationship between the creator and the record. The style and language of the document must be consistent with other, related documents that are accepted as authentic. Society of American Archivists http://www2.archivists.org/glossary/terms/a/authenticity
  14. 14. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation Archival authenticity Only records that are complete can ensure accountability and protect personal rights[…]Individual records must be complete; they must contain all the information they had when they were created. They must also maintain their original structure and context. (Hirtle) An authentic record is one that is what it purports to be and has not been tampered with or otherwise corrupted. (InterPARES 2) For a record to be considered trustworthy […] it must accurately reflect the event it records and be uncontaminated by the distorting influence of time, bias, interpretation, or unwarranted opinion on the part of the record-maker (McNeil)
  15. 15. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Initial data preparation Final dataset comprises birth, marriage and death records from 2 districts in Dublin (South City no. 1 and South City no. 3) Separate database constructed to enable the encoding of the IRL records Tables represent both the register pages and the records (“record” = historical event) Each event links back to the register page Fields created reflect original record information and structure enables transformation to RDF
  16. 16. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation • Whole, authentic record maintained to represent the original record and preserve context of creation • Every database record linked to the TIFF image – TIFFs stored in semi-meaningful arrangement • Consistent cataloguing practices (dates, square brackets, [sic], notes field to capture anomalies) • Paleography • Controlled vocabulary of death terms and professions • Archiving databases: preserving content, structure and processes (RODA toolkit (Repository of Authentic Digital Objects), SIARD (Software Independent Archiving of Relational Databases)) Data challenges
  17. 17. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation Separation of concerns – transcription vs intepretation Variance in how subject names and places were recorded (initials, short hands, name of a building versus street name) - might imply something, which we are currently unaware of. Transcription of the register pages transcribes exactly what was written down. Some interpretation necessary in order to use data however – eg. street names changing over time, new insights into medical conditions, adoption of new social theory (eg. class distinctions) Captured data in two separate ontologies – one for transcription, one for intepretation. For example a death recorded in days in the first database can be interpreted/queried as hours in the second.
  18. 18. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 GRO Triplestore Triplestore 2 Data Analysis SEPARATIONOFCONCERNS
  19. 19. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation Register page as EAD (database crosswalk)
  20. 20. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation
  21. 21. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 DRI Presentation Archival authenticity and preservation Archivist encoded entire register pages rather than lines of data regarding an individual (eg. a single life event such as a death) Database records refer back to digitised TIFFs created by General Register Office Interpretation of the dataset occurs separately – all records are transcribed exactly including typos, blank fields, details crossed out, Xs etc. TIFFs can be preserved with EAD or QDC metadata, and associated databases preserved separately and linked Querying of the data occurs only on an obfuscated dataset with personal names excluded; linked data can contain outbound links but is protected by a firewall Authenticity of the dataset
  22. 22. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 Bibliography Hirtle, Peter. “Archival Authenticity in a Digital Age”. Authenticity in a digital environment, 2000. Lee, Brent. Authenticity, Accuracy and Reliability: Reconciling Arts-related and Archival Literature, 2005. McNeil, Heather. “Trusting Records in a Postmodern World”. Archivaria 51, 2001. Pearce-Moses, Richard. A Glossary of Archival and Records Terminology, 2005. SIARD Suite: http://www.bar.admin.ch/dienstleistungen/00823/01911/index.html?lang=en
  23. 23. Data archiving for the Irish Record Linkage project This is a Placeholder for Text • Bullet-point 01 • Bullet-point 02 • Bullet-point 03 @beck_grant @IRL_project r.grant@ria.ie http://repository.dri.ie The content of this presentation is licensed as CC-BY. Please attribute to Rebecca Grant, Digital Archivist, Digital Repository of Ireland, 2015. https://irishrecordlinkage.wordpress.com/

×