Presentation given Rebecca Grant, DRI Digital Archivist, and Dolores Grant, IRL-DRI Digital Archivist, at the Irish Record Linkage workshop held at the University of Limerick, 10th February 2016. It gives an overview of the Irish Research Council funded Irish Record Linkage project, focusing on how digital data archiving was undertaken by the partners at the Digital Repository of Ireland.
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
Rebecca Grant & Dolores Grant - Data Archiving for the Irish Record Linkage Project
1. Data archiving for the Irish Record Linkage
project
Rebecca Grant, Digital Archivist, Digital Repository of Ireland
Dolores Grant, IRL-DRI Digital Archivist, Digital Repository of Ireland
2. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Irish Record Linkage project 1864-1913
Irish Record Linkage is an Irish Research Council funded project running from
2014 – June 2016
Collaboration between the University of Limerick (historians), the Digital
Repository of Ireland at the Royal Irish Academy (archivists), and Insight@NUI
Galway (knowledge engineers, Linked Data experts)
Constructing a Knowledge Platform – Linked Data based on Vital Registration
Data (digitised registers of Births, Marriages and Deaths) in order to answer
research questions around infant and maternal mortality
3. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Irish Record Linkage project 1864-1913
The Linked Data concept and the project’s dataset
Extracting data from the vital records
Approaches to archival authenticity
Preservation of the records
4. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
The Digital Repository of Ireland
DRI is a trusted digital repository for the Humanities
and Social Sciences data – launched June 2015 and
based at the Royal Irish Academy
Linking and preserving the rich collections held by
Irish institutions (archives, museums, libraries,
galleries, universities, research projects etc)
Focal point for the development of national guidelines
and policy for digital preservation and access.
repository.dri.ie
5. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
INSIGHT@NUI Galway
Insight is a joint initiative between University College Dublin,
the National University of Ireland at Galway, University College
Cork, and Dublin City University. Insight was established in 2013
by Science Foundation Ireland with funding of €75m.
The Semantic Web,
Sensors and the Sensor Web,
Social network analysis,
Decision Support and Optimization, and
Connected Health.
6. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Irish Record Linkage and Linked Data Queries
• How many women died within 42 days following childbirth due to
complications related to labour and how does that figure correspond
with the official reports?
• Which women died of causes that can be attributed to maternal death,
but for which no corresponding birth certificate exists?
• How did various socio-economic conditions affect maternal and infant
mortality rates?
7. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
The General Register Office (GRO) – civil registry responsible for recording
information on births, deaths and marriages.
Records of 5,847,323 births (from 1864 to 1912), 4,236,922 deaths (from 1864 and
1912) and 1,160,546 marriages (from 1845 to 1912) transferred to the project
team with strict terms and conditions.
Events were captured on register pages (up to 10 for births and deaths, and up to 4
for marriages) divided by district and sent to the GRO where volumes were then
created and an index compiled.
Database dump of the GRO's database with digitised versions of the
register pages and indexes (TIFFs)
General Register Office records
8. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
The Linked Data Concept
The example above describes the subject (James Joyce) and his
relationship (predicate) to an object (Dublin). By semantically
separating the elements of the information (that James Joyce was
born in Dublin) datasets stored in this way can be easily queried.
9. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
10. Birth Records
Register TIFF Index TIFF System Pre 1900 System Post 1900
Superintendent Registrar’s
District
Registrar’s District Registration district District District
Union
County County County
Province Province
Number in register Entry number
Date & place of birth Year of event Date of birth, year of event
Name (if any) Name Forename, Surname Forename, Surname
Sex Sex
Name, surname &
dwelling place of father
Name & surname &
maiden surname of
mother
Mother’s maiden name
Rank or profession of
father
Signature, qualification,
and residence of
informant
When Registered Returns year Returns year
Returns quarter Returns quarter
Signature of Registrar
Name & surname &
maiden surname of
mother
Rank or profession of
father
Signature, qualification,
11. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Archival principles
The principle of provenance: Provenance means the history of
ownership related to a group of records or an individual item in a
collection. Preserving information on these relationships is essential as
they provide evidence of how and who created and used the records
before they became part of the archives. Provenance provides essential
contextual information for understanding the content and history of an
archival collection
The principle of original order: Archives are kept in the order in which
they were originally created or used. This original order allows
custodians to protect the authenticity of the records and provides
essential information as to how they were created, kept and used.
12. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Data (eg. database records and TIFFs) are only stored for the duration of the
project, and must be destroyed following its completion
Data can only be accessed by the IRL project team after an access agreement has
been signed
Records cannot be duplicated, downloaded, brought off-site
Personal, identifying information cannot be published
Copyright and related rights remain vested in the General Register Office.
Terms of transfer
13. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
Archival authenticity
The quality of being genuine, not a counterfeit, and free from tampering,
and is typically inferred from internal and external evidence, including its
physical characteristics, structure, content, and context.
The presence of a signature serves as a fundamental test for authenticity;
the signature identifies the creator and establishes the relationship between
the creator and the record.
The style and language of the document must be consistent with
other, related documents that are accepted as authentic.
Society of American Archivists
http://www2.archivists.org/glossary/terms/a/authenticity
14. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
Archival authenticity
Only records that are complete can ensure accountability and protect
personal rights[…]Individual records must be complete; they must contain
all the information they had when they were created. They must also
maintain their original structure and context. (Hirtle)
An authentic record is one that is what it purports to be and has not
been tampered with or otherwise corrupted. (InterPARES 2)
For a record to be considered trustworthy […] it must accurately
reflect the event it records and be uncontaminated by the distorting
influence of time, bias, interpretation, or unwarranted opinion on
the part of the record-maker (McNeil)
15. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Initial data preparation
Final dataset comprises birth, marriage and death records from 2 districts
in Dublin (South City no. 1 and South City no. 3)
Separate database constructed to enable the encoding of the IRL records
Tables represent both the register pages and the records (“record” =
historical event)
Each event links back to the register page
Fields created reflect original record information and structure enables
transformation to RDF
16. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
• Whole, authentic record maintained to represent the original
record and preserve context of creation
• Every database record linked to the TIFF image – TIFFs stored in
semi-meaningful arrangement
• Consistent cataloguing practices (dates, square brackets, [sic],
notes field to capture anomalies)
• Paleography
• Controlled vocabulary of death terms and professions
• Archiving databases: preserving content, structure and processes
(RODA toolkit (Repository of Authentic Digital Objects), SIARD
(Software Independent Archiving of Relational Databases))
Data challenges
17. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
Separation of concerns – transcription vs intepretation
Variance in how subject names and places were recorded (initials,
short hands, name of a building versus street name) -
might imply something, which we are currently unaware of.
Transcription of the register pages transcribes exactly what was written down.
Some interpretation necessary in order to use data however – eg. street names
changing over time, new insights into medical conditions, adoption of new
social theory (eg. class distinctions)
Captured data in two separate ontologies – one for transcription, one for
intepretation. For example a death recorded in days in the first database can be
interpreted/queried as hours in the second.
18. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
GRO Triplestore
Triplestore 2 Data Analysis
SEPARATIONOFCONCERNS
19. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
Register page as EAD (database crosswalk)
20. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
21. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
DRI Presentation
Archival authenticity and preservation
Archivist encoded entire register pages rather than lines of data regarding an
individual (eg. a single life event such as a death)
Database records refer back to digitised TIFFs created by General Register Office
Interpretation of the dataset occurs separately – all records are transcribed exactly
including typos, blank fields, details crossed out, Xs etc.
TIFFs can be preserved with EAD or QDC metadata, and associated databases
preserved separately and linked
Querying of the data occurs only on an obfuscated dataset with personal names
excluded; linked data can contain outbound links but is protected by a firewall
Authenticity of the dataset
22. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
Bibliography
Hirtle, Peter. “Archival Authenticity in a Digital Age”. Authenticity in a digital
environment, 2000.
Lee, Brent. Authenticity, Accuracy and Reliability: Reconciling Arts-related and
Archival Literature, 2005.
McNeil, Heather. “Trusting Records in a Postmodern World”. Archivaria 51,
2001.
Pearce-Moses, Richard. A Glossary of Archival and Records Terminology, 2005.
SIARD Suite:
http://www.bar.admin.ch/dienstleistungen/00823/01911/index.html?lang=en
23. Data archiving for the Irish Record Linkage project
This is a Placeholder for Text
• Bullet-point 01
• Bullet-point 02
• Bullet-point 03
@beck_grant
@IRL_project
r.grant@ria.ie
http://repository.dri.ie
The content of this presentation is licensed as CC-BY. Please attribute to Rebecca Grant, Digital
Archivist, Digital Repository of Ireland, 2015.
https://irishrecordlinkage.wordpress.com/