This document describes the Irish Record Linkage project which links historic vital registration data from 1864-1913 into a knowledge platform. The project is a collaboration between the Digital Repository of Ireland, University of Limerick, and Insight@NUI Galway. It aims to address research questions around infant and maternal mortality in Dublin by applying semantic technologies to digitized birth, death, and marriage records provided by the General Register Office. Researchers are creating ontologies and linked data from the records to allow for detailed analysis while maintaining the authenticity and security of the sensitive personal data.
1. Reusing Legacy data: Irish Historic Vital Registration Data,
1864-1913
Dolores Grant
Dr Ciara Breathnach, Dr Sandra Collins, Rebecca Grant
Irish Record Linkage 1864-1913
2. Irish Record Linkage project 1864-1913
Irish Record Linkage is an Irish Research Council funded project running
until December 2015
To construct a Knowledge Platform by applying semantic technologies
to vital-registration data generously shared by the Office of the
Registrar General
To address research queries around infant and maternal mortality rates
and patterns in Dublin
3. Irish Record Linkage project 1864-1913
Collaboration between the Digital Repository of Ireland at the Royal
Irish Academy, the University of Limerick and Insight@NUI Galway
Principal Investigators: Dr Ciara Breathnach (UL), Dr Sandra Collins (DRI),
Dr Stefan Decker (Insight)
Project Team: Dr Brian Gurrin (UL), Dr Christophe Debruyne
(Insight/DRI), Dr Oya Beyan (Insight), Rebecca Grant (DRI), Dolores
Grant (DRI)
4. University of Limerick
Its mission is to promote and advance learning and knowledge through
teaching, research and scholarship in an environment which encourages
innovation and upholds the principles of free enquiry and expression.
The Faculty of Arts, Humanities and Social Sciences prides itself on the
quality of its teaching and its commitment to research and places a
strong emphasis on the role of debate and discussion in the
development of knowledge and analytical skills.
5. The Digital Repository of Ireland
Based in the Royal Irish Academy (Ireland's Academy for the Sciences
and Humanities)
DRI is a trusted digital repository for the Humanities and Social Sciences
data
Linking and preserving the rich data held by Irish institutions, providing
a central internet access point and multimedia tools
Focal point for the development of national guidelines and policy for
digital preservation and access
6. INSIGHT@NUI Galway
Insight brings together leading Irish academics from 5
of Ireland'€™s leading research centres (DERI, CLARITY, CLIQUE, 4C, TRIL),
in key areas of priority research including:
The Semantic Web,
Sensors and the Sensor Web,
Social network analysis,
Decision Support and Optimization, and
Connected Health.
7. Irish Historic Vital Registration Data
1845: Registration of marriages act was introduced to gather official statistics
of marriages of the established Church of Ireland
1864: the first year Births, Deaths and Marriages (including Catholic
Marriages) were registered following the establishment of a complete Irish
civil registration system in 1863
Ireland 1864-1912: 2.9 million birth records
4.9 million death records
3.18 million marriages
Dublin 1864-1912: 609,720 birth records
537,635 death records
330,605 marriage records (1845-1913)
8. The Linked Data Concept
A method of publishing structured data on the Web, allowing it to be
connected and enriched, and facilitating linking between related
resources.
A key principle of Linked Data is that HTTP URIs are used to name the
semantic elements of the dataset
Linked Data standards such as RDF allows semantic definitions to be
applied to information, using statements called ‘triples’ in the form
subject, predicate, object.
9. The Linked Data Concept
This example describes the subject (James Joyce) and his relationship
(predicate) to an object (Dublin). By semantically separating the
elements of the information (that James Joyce was born in Dublin)
datasets stored in this way can be easily queried.
10. General Register Office Data
Vital registration data: birth, death, marriage records for Dublin
TIFF images of pre-digitised indexes and registers of birth, death and
marriage
General Register Office database for these records
11.
12. Marriage Records
Register TIFF Index TIFF System 1845-1901 System 1902-c.1912
Registrar’s District Registration District District District
Marriage solemnised at
Parish
Union
County County County
Province Province
Number in register Entry number
When married Year of event Year of event , Date
of marriage
When registered Returns year Returns year
Returns quarter Returns quarter
Name and surname Name Forename, Surname Forename, Surname
Partner’s surname
Age
Sex
Condition
Rank or profession
Residence at the time
of marriage
Father’s name and
surname
Rank or profession of
father
Celebrant
Witnesses
Signature of Registrar
Signature of
Superintendant
Registrar and date
Stamp Number Stamp number Stamp number
Volume number Returns volume number Returns volume
number
Page number Page number Returns page number Returns Page
number
Stamped number Page ID Page ID
2nd Stamped number
Index entry number Index entry number
Index page number
13. Birth Records
Register TIFF Index TIFF System Pre 1900 System Post 1900
Superintendent Registrar’s
District
Registrar’s District Registration district District District
Union
County County County
Province Province
Number in register Entry number
Date & place of birth Year of event Date of birth, year of event
Name (if any) Name Forename, Surname Forename, Surname
Sex Sex
Name, surname &
dwelling place of father
Name & surname &
Mother’s maiden name
maiden surname of
mother
Rank or profession of
father
Signature, qualification,
and residence of
informant
When Registered Returns year Returns year
Returns quarter Returns quarter
Signature of Registrar
Name & surname &
maiden surname of
mother
Rank or profession of
father
Signature, qualification,
and residence of
informant
Signature of Registrar
Signature of
Superintendant Registrar
and date
Baptismal name if added
after registration of birth
and date
Stamp Number Stamp number Stamp number
Volume number Returns volume number Returns volume number
Page number Page number Returns page number Returns page number
Stamped number Page ID
2nd Stamped
number
Index entry number
Index page number
14. Death Records
Register TIFF Index TIFF System
Superintendent Registrar’s
District
Registrar’s District Registration District District
District
Union
County County
Province
Number in register
Date and place of death Year of event
Name and surname Name Forename, Surname
Sex
Condition
Age last birthday Age Age at death
Rank, profession or occupation
Certified cause of death and
duration of illness
Signature, qualification and
residence of informant
When registered Returns year
Returns quarter
Signature of Registrar
Signature of Superintendant
Registrar and date
Stamp number Stamp number
Volume number Returns volume number
Page number Page number Returns page number
Stamped number Page ID
2nd Stamped number
Index entry number
Index page number
15. Research Questions
Identifying the record fields that are necessary to maintain the archival
authenticity of the records and answer the research questions:
•How many women died within 42 days following childbirth due to
complications related to labour and how does that figure correspond
with the official reports?
•Which women died of causes that can be attributed to maternal death,
but for which no corresponding birth certificate exists?
•How did various socio-economic conditions affect maternal and infant
mortality rates?
16. Competency questions to construct the Ontology
ID Competency Question
C01 Women died within 42 days after giving birth
(the date of birth counted as day 1 and day 42 is included)
C02 Women died within 42 days after giving birth AND in their death certificate
‘complication 1’ is mentioned.
C03 Women died within 42 days after giving birth AND in their death certificate
‘complication 2’ is mentioned.
C04 Women having official maternal death reports including “XXXX’
C05 Women having official maternal death reports including “cause 1”
C06 Women having official maternal death reports including “cause 2 and cause 3
together”
C07 For each record in C04 find the ones with corresponding birth record
(the date of death counted as day 1 and day 42 is included)
17. Creation of RDF triples
described by
GRO
Triplestore
Digital Archivist
extract load
GRO Ontology
consulted by
amends/curates
Transform
GRO Database
Storage Model
Metadata that
can be queried
declaratively
with a W3C
standard
18. GRO Records annotation vs. Data Analysis
GRO Triplestore
Triplestore 2 Data Analysis
Transformation from one model to
another
• SPIN – SPARQL Inference
• SWRL / RuleML
• SPARQL Construct
• …
SEPARATION OF CONCERNS
19. <#B000-001> a
irl:BirthRecord;
irl:on "1900-08-08";
irl:name "James";
irl:mother "Mary Murphy";
irl:place "Castle Road"; …
<#B010-022> a
irl:BirthRecord;
irl:on "1902-04-19";
irl:name "Patrick";
irl:mother "Mary Murphy";
irl:place "Castle Road"; ...
<#B022-051> a
irl:BirthRecord;
irl:on "1904-09-20";
irl:name "Agnes";
irl:mother "Mary Murphy";
irl:place “Convent Hill"; ...
<#B050-003> a
irl:BirthRecord;
irl:on "1905-02-18";
irl:name "Michael";
#1 Mary
Murphy
owl:sameAs
#2 Mary
Murphy
owl:sameAs
#3 Mary
Murphy
#4 Mary
Murphy
owl:sameAs
TRANSFORMATION
ONTOLOGY
MATCHING
All generated are
stored separately
for data analytics ...
20. Data analysis on the generated triples
#1 Mary
Murphy
#1 Mary
Murphy
#1 Mary
Murphy
James Patrick Michael
1900-08-08 1902-04-19 1905-02-18
619 days 1036 days
Average sibship interval = 827.5 days
21. Data Challenges
•Data security - transfer, storage and use by authorised parties
•Data protection best practice
•Quantity of data
•Varying levels of detail eg causes of death
• Establishing maternal death- fever
•Archaic medical terms
•Variances in record subject names and places
•Place names changes over time
22. The Irish Record Linkage Knowledge Platform
• State of the art linked data & ontology based analysis
platform for historical 'big data'
• Platform within a secure, closed system
• Prepared to allow formulation of the specific research
queries
• Query interface to allow for the historical analysis of the
data.
• Potential expansion to include additional contextualising
datasets
@IRL_Project http://irishrecordlinkage.
DRI Presentation
wordpress.com/
Editor's Notes
The resulting platform will provide a powerful research resource to enable the historians to study Irish infant and maternal mortality rates and patterns during this period of Irish history. The project aims to provide a comprehensive map of infant and maternal mortality for Dublin.
Our project team is cross-disciplinary and team members include knowledge engineers, historians and digital archivists.
A marriage register page from 1900
The research questions set by Dr Breathnach. Identifying, tracking and interlinking individuals across the registers, through place and time, allows for a granular analysis of these reconstructed virtual households thus enabling the analysis of Irish historic rates, which have yet to receive thorough treatment from historians.
Some context around the records chosen for the project