This presentation was given by Rebecca Grant, Digital Archivist at the Digital Repository of Ireland, at the annual Archives and Records Association (UK and Ireland) conference in Dublin, Wednesday 26th August 2016. It discusses the issues of archival authenticity that came up during the Irish Record Linkage project and how these issues were addressed.
Co-authors: Dolores Grant, Dr Sharon Webb, Dr Sandra Collins
Rebecca Grant - Approaching Archival Authenticity: when 'Records' become 'Data.
1. Approaching Archival Authenticity when āRecordsā become
āDataā
Rebecca Grant, Digital Archivist, Digital Repository of Ireland
Dolores Grant, DRI-IRL Digital Archivist, Digital Repository of Ireland
Dr. Sharon Webb, Knowledge Transfer Manager, Digital Arts & Humanities PhD
Programme
Dr. Sandra Collins, Director, Digital Repository of Ireland
2. The Digital Repository of Ireland
DRI is a trusted digital repository for the Humanities and
Social Sciences data ā launched June 2015
Linking and preserving the rich collections held by Irish
institutions (archives, museums, libraries, galleries,
universities, research projects etc)
Focal point for the development of national guidelines and
policy for digital preservation and access.
repository.dri.ie
3. Irish Record Linkage project 1864-1913
Irish Record Linkage is an Irish Research Council funded project running
from 2014 ā September 2015
Collaboration between the University of Limerick (medical historians),
the Digital Repository of Ireland at the Royal Irish Academy (archivists!),
and Insight@NUI Galway (knowledge engineers, Linked Data experts)
Constructing a Knowledge Platform ā Linked Data based on Vital
Registration Data (digitised registers of Births, Marriages and Deaths) in
order to answer research questions around infant and maternal
mortality
4. Irish Record Linkage and Linked Data Queries
ā¢ How many women died within 42 days following childbirth due to
complications related to labour and how does that figure correspond
with the official reports?
ā¢ Which women died of causes that can be attributed to maternal death,
but for which no corresponding birth certificate exists?
ā¢ How did various socio-economic conditions affect maternal and infant
mortality rates?
5. The General Register Office (GRO) ā civil registry responsible for recording
information on births, deaths and marriages.
Records of 6,009,781 births (from 1864 to 1912), 4,314,963 deaths (from 1864 and
1912) and 1,443,110 marriages (from 1845 to 1912) transferred to the project
team with strict terms and conditions.
Events were captured on register pages (up to 10 for births and deaths, and up to 4
for marriages) divided by district and sent to the GRO where volumes were then
created and an index compiled.
Database dump of the GRO's database with digitised versions of the
register pages and indexes (TIFFs)
General Register Office records
6. Data (eg. database records and TIFFs) are only stored for the duration of the
project, and must be destroyed following its completion
Data can only be accessed by the IRL project team after an access agreement has
been signed
Records cannot be duplicated, downloaded, brought off-site
Personal, identifying information cannot be published
Copyright and related rights remain vested in the General Register Office.
Terms of transfer
7. Birth records with redactions
The IRL project are not data
owners..
The security and authenticity of the
dataset were critical to the success
of the project.
8. The Linked Data Concept
A method of publishing structured data on the Web,
allowing it to be connected and enriched, and facilitating
linking between related resources.
Linked Data standards such as RDF allows semantic
definitions to be applied to information, using statements
called ātriplesā in the form subject, predicate, object.
A key principle of Linked Data is that HTTP URIs are used to
name the semantic elements of the dataset
9. The Linked Data Concept
The example above describes the subject (James Joyce) and his
relationship (predicate) to an object (Dublin). By semantically
separating the elements of the information (that James Joyce was
born in Dublin) datasets stored in this way can be easily queried.
10. Competency questions for ontology construction
ID Competency Question
C01 Women died within 41 days after giving birth
(the date of birth counted as day 1 and day 41 is included)
C02 Women died within 41 days after giving birth AND in their death certificate
ācomplication 1ā is mentioned.
C03 Women died within 41 days after giving birth AND in their death certificate
ācomplication 2ā is mentioned.
C04 Women having official maternal death reports including āXXXXā
C05 Women having official maternal death reports including ācause 1ā
C06 Women having official maternal death reports including ācause 2 and cause 3
togetherā
C07 For each record in C04 find the ones with corresponding birth record
(the date of death counted as day 1 and day 41 is included)
13. Register TIFF Index TIFF System Pre 1900 System Post 1900
Superintendent Registrarās
District
Registrarās District Registration district District District
Union
County County County
Province Province
Number in register Entry number
Date & place of birth Year of event Date of birth, year of event
Name (if any) Name Forename, Surname Forename, Surname
Sex Sex
Name, surname &
dwelling place of father
Name & surname &
maiden surname of
mother
Motherās maiden name
Rank or profession of
father
Signature, qualification,
and residence of
informant
When Registered Returns year Returns year
Returns quarter Returns quarter
Signature of Registrar
Name & surname &
maiden surname of
mother
Rank or profession of
father
Signature, qualification,
and residence of
informant
Signature of Registrar
14. DRI Presentation
Archival authenticity
The quality of being genuine, not a counterfeit, and free from tampering,
and is typically inferred from internal and external evidence, including its
physical characteristics, structure, content, and context.
The presence of a signature serves as a fundamental test for authenticity;
the signature identifies the creator and establishes the relationship between
the creator and the record.
The style and language of the document must be consistent with
other, related documents that are accepted as authentic.
Society of American Archivists
http://www2.archivists.org/glossary/terms/a/authenticity
15. DRI Presentation
Archival authenticity
Only records that are complete can ensure accountability and protect
personal rights[ā¦]Individual records must be complete; they must contain
all the information they had when they were created. They must also
maintain their original structure and context. (Hirtle)
An authentic record is one that is what it purports to be and has not
been tampered with or otherwise corrupted. (InterPARES 2)
For a record to be considered trustworthy [ā¦] it must accurately
reflect the event it records and be uncontaminated by the distorting
influence of time, bias, interpretation, or unwarranted opinion on
the part of the record-maker (McNeil)
16. DRI Presentation
Approaching authenticity for the IRL project
The dataset cannot provide evidence of structure, context, standardised
style, signatures ā therefore the data ārecordā must always be linked to
the TIFF
The ārecordsā transcribed must be complete ā all data must be
transcribed, even if it is not currently used to answer our research
questions
The ārecordsā should not be biased by interpretation ā each piece of
data should be transcribed faithfully.
17. Initial data preparation
Final dataset comprises death records from 2 districts in Dublin (South
City no. 1 and South City no. 3)
Separate database constructed to enable the encoding of the IRL records
Tables represent both the register pages and the records (ārecordā =
historical event)
The register page and record are linked to the index page
Fields created reflect original record information and structure enables
transformation to RDF
18. DRI Presentation
ā¢ Whole, authentic record maintained to represent the original
record and preserve context of creation
ā¢ Every database record linked to the TIFF image ā TIFFs stored in
semi-meaningful arrangement
ā¢ Consistent cataloguing practices (dates, square brackets, [sic],
notes field to capture anomalies)
ā¢ Paleography
ā¢ Controlled vocabulary of death terms and professions
ā¢ Archiving databases: preserving content, structure and processes
(RODA toolkit (Repository of Authentic Digital Objects), SIARD
(Software Independent Archiving of Relational Databases))
Data challenges
19. GRO Triplestore
Triplestore 2 Data Analysis
Transformation from one model to
another
ā¢ SPIN ā SPARQL Inference
ā¢ SWRL / RuleML
ā¢ SPARQL Construct
ā¢ ā¦
SEPARATIONOFCONCERNS
GRO Records annotation vs. Data Analysis
20. DRI Presentation
Separation of concerns ā transcription vs intepretation
Variance in how subject names and places were recorded (initials,
short hands, name of a building versus street name) -
might imply something, which we are currently unaware of.
Transcription of the register pages transcribes exactly what was written
down.
Some interpretation necessary in order to use data however ā eg. street
names changing over time, new insights into medical conditions, adoption
of new social theory (eg. class distinctions)
Captured data in two separate ontologies ā one for transcription, one for
intepretation. For example a death recorded in days in the first database
can be interpreted/queried as hours in the second.
23. DRI Presentation
Thinking about archival authenticity
Archivist encoded entire register pages rather than lines of data regarding an
individual (eg. a single life event such as a death)
Database records refer back to digitised TIFFs created by General Register Office
Interpretation of the dataset occurs separately ā all records are transcribed exactly
including typos, blank fields, details crossed out, Xs etc.
TIFFs can be preserved with EAD metadata, and associated databases preserved
separately and linked
Querying of the data occurs only on an obfuscated dataset with personal names
excluded; linked data can contain outbound links but is protected by a firewall
Authenticity of the dataset
24. Bibliography
Hirtle, Peter. āArchival Authenticity in a Digital Ageā. Authenticity in a digital
environment, 2000.
Lee, Brent. Authenticity, Accuracy and Reliability: Reconciling Arts-related and
Archival Literature, 2005.
McNeil, Heather. āTrusting Records in a Postmodern Worldā. Archivaria 51,
2001.
Pearce-Moses, Richard. A Glossary of Archival and Records Terminology, 2005.
SIARD Suite:
http://www.bar.admin.ch/dienstleistungen/00823/01911/index.html?lang=en