SlideShare a Scribd company logo
1 of 12
An Identifier Scheme for the
Digitising Scotland Project
Alasdair J G Gray
Department of Computer Science,
Heriot-Watt University,
Edinburgh
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Özgür Akgün, Uni. of St Andrews
Ahamd Alsadeeqi, Heriot-Watt Uni.
Peter Christen, Australian National Uni.
Tom Dalton, Uni. of St Andrews
Alan Dearle, Uni. of St Andrews
Chris Dibben, Uni. of Edinburgh
Eilidh Garret, Uni. of Essex
Graham Kirby, Uni. of St Andrews
Alice Reid, Uni. of Cambridge
Lee Williamson, Uni. of Edinburgh
Digitising Scotland Project
Large scale family reconstruction
studies and Pedigrees
• Transcription of data
• Linking of data
Performed at scale
• Whole nation
• Large timeframe
1 June 2017 ADRN Conference 2
Project Team
Backgrounds
• Demographers • Historians • Computer Scientists
Distributed team
1 June 2017 ADRN Conference 3
St Andrews Cambridge Edinburgh Edinburgh Australia
Transcribing Scotland’s Vital
Records: 1855 – 1974
• 24M records
• Birth
• Marriage
• Death
• 18M individuals
41 June 2017 ADRN Conference
Data Linkage Challenges
Low quality data
Probabilistic matches
Scalability
Skewed name
distributionsJohn Grant
Fisherman
Fiona Sinclair
Ian Grant
Smithy
Born: 1861
Stuart Adam
Wheelwright
Morag Scott
Flora Adam
Seamstress
Born: 1866
Married: 1884
John Grant
Farmer
Fiona Sinclaire
Iain Grant
Born: 1860
1 June 2017 ADRN Conference 5
Linking Skye Data
1 June 2017 ADRN Conference 6
Discussing records
Eilidh, I’m having problems with the
Skye record B-BABY-8293.
Peter, which transcribed certificate
is that?
It is the record for Chris Dibben,
born 18 March 1893.
That is the child on record 5457. It
should link to the death on record
5754, 4 December 1959.
Thanks, found it now. It is record
D-DEATH-2182.
1 June 2017 ADRN Conference 7
Existing Identifier Schemes
Historians: Example: 5457
• Incremental integer
• Easily confused with other record
types
• Identifies certificate not actors
• Based on order of transcription
• Not derived from data
• Unique for a file
• Excel spreadsheet
Record Linkage: Example: B-BABY-8293
• Encode type of certificate and
actor on certificate
• Four digits generated by linkage
process
• Different from those used by the
historians
• Different for each run of linkage
pre-processing
1 June 2017 ADRN Conference 8
Desiderata for Identifiers
1. Identifier for each
actor on a certificate
2. Exchangeable between
researchers
3. Unique generation
process from the data
4. Immutable to data
changes, e.g. typo
discovered in data
5. Human derivable from
data records
6. Human interpretable
7. Compact to enable
efficient computation
8. Susceptible to blocking
9. Globally unique
10.Consistent approach
for all record types
11.Compatible with pre-
existing NRS approach
12.Compatibility with
Open Data Standards
1 June 2017 ADRN Conference 9
Identifier Scheme
B1903_164_00_baby
1 June 2017 ADRN Conference 10
typeYear_district_subdistrict_entryNumber_role
Certificate Roles
Birth
• baby
• mother
• father
• registrar
• informant
Marriage
• groom
• groom_father
• groom_mother
• bride
• bride_father
• bride_mother
• witness1
• witness2
• officiant
• registrar
Death
• deceased
• mother
• father
• spouse1…spousen
• informant
• doctor
• registrar
1 June 2017 ADRN Conference 11
Conclusions
• Agreed identifier scheme
typeYear_district_subdistrict_entryNumber_role
• Meets desiderata
• Reliant on “clean” parts of certificate
• Compatible with NRS
• Improved team communications
Alasdair Gray
www.macs.hw.ac.uk/~ajg33/
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Acknowledgements:
• Julia Jennings
• Christine Jones
• Diego Ramiro-Farinas
1 June 2017 ADRN Conference 12

More Related Content

What's hot

CrossRef Annual Meeting 2012 ORCID Laure Haak
CrossRef Annual Meeting 2012 ORCID Laure HaakCrossRef Annual Meeting 2012 ORCID Laure Haak
CrossRef Annual Meeting 2012 ORCID Laure Haak
Crossref
 

What's hot (20)

Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
 
Erdmann apr28-2
Erdmann apr28-2Erdmann apr28-2
Erdmann apr28-2
 
Eswc2018 wimu slides
Eswc2018 wimu slidesEswc2018 wimu slides
Eswc2018 wimu slides
 
2009 11 icudl
2009 11 icudl2009 11 icudl
2009 11 icudl
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
 
Meadows apr28-1
Meadows apr28-1Meadows apr28-1
Meadows apr28-1
 
ORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice MeadowsORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice Meadows
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
 
Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
 
CrossRef Annual Meeting 2012 ORCID Laure Haak
CrossRef Annual Meeting 2012 ORCID Laure HaakCrossRef Annual Meeting 2012 ORCID Laure Haak
CrossRef Annual Meeting 2012 ORCID Laure Haak
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Your Work is Distinctive, What about Your Name?
Your Work is Distinctive, What about Your Name?Your Work is Distinctive, What about Your Name?
Your Work is Distinctive, What about Your Name?
 
DataCite overview 2014
DataCite overview 2014DataCite overview 2014
DataCite overview 2014
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentives
 
Your Work is Distinctive, What about Your Name? Japan Library Fair 2014
Your Work is Distinctive, What about Your Name? Japan Library Fair 2014Your Work is Distinctive, What about Your Name? Japan Library Fair 2014
Your Work is Distinctive, What about Your Name? Japan Library Fair 2014
 
Crossref DataCite joint data citation webinar
Crossref DataCite joint data citation webinarCrossref DataCite joint data citation webinar
Crossref DataCite joint data citation webinar
 

Similar to An Identifier Scheme for the Digitising Scotland Project

OpenAIRE 2014: Next-generation Metrics for Scholarly Performance
OpenAIRE 2014: Next-generation Metrics for Scholarly PerformanceOpenAIRE 2014: Next-generation Metrics for Scholarly Performance
OpenAIRE 2014: Next-generation Metrics for Scholarly Performance
William Gunn
 
20131205 aserl bryant
20131205 aserl bryant20131205 aserl bryant
20131205 aserl bryant
ORCID, Inc
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
TERN Australia
 

Similar to An Identifier Scheme for the Digitising Scotland Project (20)

Report from RDAPlenary 3 to DataCitation Community in Australia
Report from RDAPlenary 3 to DataCitation Community in AustraliaReport from RDAPlenary 3 to DataCitation Community in Australia
Report from RDAPlenary 3 to DataCitation Community in Australia
 
RDA Update
RDA UpdateRDA Update
RDA Update
 
OpenAIRE 2014: Next-generation Metrics for Scholarly Performance
OpenAIRE 2014: Next-generation Metrics for Scholarly PerformanceOpenAIRE 2014: Next-generation Metrics for Scholarly Performance
OpenAIRE 2014: Next-generation Metrics for Scholarly Performance
 
Researh data management
Researh data managementResearh data management
Researh data management
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
 
Supporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data ScienceSupporting Big Data, Open Data, Data Analytics and Data Science
Supporting Big Data, Open Data, Data Analytics and Data Science
 
How and Why to Share Your Data
How and Why to Share Your DataHow and Why to Share Your Data
How and Why to Share Your Data
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John KayeUKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
UKSG Conference 2017 Breakout - Jisc Research Data Shared Service - John Kaye
 
ORCID: Connecting Research & Researchers
ORCID: Connecting Research & ResearchersORCID: Connecting Research & Researchers
ORCID: Connecting Research & Researchers
 
Rdm slides march 2014
Rdm slides march 2014Rdm slides march 2014
Rdm slides march 2014
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
Accessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five SafesAccessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five Safes
 
20131205 aserl bryant
20131205 aserl bryant20131205 aserl bryant
20131205 aserl bryant
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
 
Birgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International PerspectiveBirgit Schmidt: RDA for Libraries from an International Perspective
Birgit Schmidt: RDA for Libraries from an International Perspective
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
Open Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon HodsonOpen Science - Global Perspectives/Simon Hodson
Open Science - Global Perspectives/Simon Hodson
 

More from Alasdair Gray

Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
Alasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
Alasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Alasdair Gray
 

More from Alasdair Gray (19)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 

Recently uploaded

如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
23050636
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Recently uploaded (20)

如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

An Identifier Scheme for the Digitising Scotland Project

  • 1. An Identifier Scheme for the Digitising Scotland Project Alasdair J G Gray Department of Computer Science, Heriot-Watt University, Edinburgh @gray_alasdair www.macs.hw.ac.uk/~ajg33 Özgür Akgün, Uni. of St Andrews Ahamd Alsadeeqi, Heriot-Watt Uni. Peter Christen, Australian National Uni. Tom Dalton, Uni. of St Andrews Alan Dearle, Uni. of St Andrews Chris Dibben, Uni. of Edinburgh Eilidh Garret, Uni. of Essex Graham Kirby, Uni. of St Andrews Alice Reid, Uni. of Cambridge Lee Williamson, Uni. of Edinburgh
  • 2. Digitising Scotland Project Large scale family reconstruction studies and Pedigrees • Transcription of data • Linking of data Performed at scale • Whole nation • Large timeframe 1 June 2017 ADRN Conference 2
  • 3. Project Team Backgrounds • Demographers • Historians • Computer Scientists Distributed team 1 June 2017 ADRN Conference 3 St Andrews Cambridge Edinburgh Edinburgh Australia
  • 4. Transcribing Scotland’s Vital Records: 1855 – 1974 • 24M records • Birth • Marriage • Death • 18M individuals 41 June 2017 ADRN Conference
  • 5. Data Linkage Challenges Low quality data Probabilistic matches Scalability Skewed name distributionsJohn Grant Fisherman Fiona Sinclair Ian Grant Smithy Born: 1861 Stuart Adam Wheelwright Morag Scott Flora Adam Seamstress Born: 1866 Married: 1884 John Grant Farmer Fiona Sinclaire Iain Grant Born: 1860 1 June 2017 ADRN Conference 5
  • 6. Linking Skye Data 1 June 2017 ADRN Conference 6
  • 7. Discussing records Eilidh, I’m having problems with the Skye record B-BABY-8293. Peter, which transcribed certificate is that? It is the record for Chris Dibben, born 18 March 1893. That is the child on record 5457. It should link to the death on record 5754, 4 December 1959. Thanks, found it now. It is record D-DEATH-2182. 1 June 2017 ADRN Conference 7
  • 8. Existing Identifier Schemes Historians: Example: 5457 • Incremental integer • Easily confused with other record types • Identifies certificate not actors • Based on order of transcription • Not derived from data • Unique for a file • Excel spreadsheet Record Linkage: Example: B-BABY-8293 • Encode type of certificate and actor on certificate • Four digits generated by linkage process • Different from those used by the historians • Different for each run of linkage pre-processing 1 June 2017 ADRN Conference 8
  • 9. Desiderata for Identifiers 1. Identifier for each actor on a certificate 2. Exchangeable between researchers 3. Unique generation process from the data 4. Immutable to data changes, e.g. typo discovered in data 5. Human derivable from data records 6. Human interpretable 7. Compact to enable efficient computation 8. Susceptible to blocking 9. Globally unique 10.Consistent approach for all record types 11.Compatible with pre- existing NRS approach 12.Compatibility with Open Data Standards 1 June 2017 ADRN Conference 9
  • 10. Identifier Scheme B1903_164_00_baby 1 June 2017 ADRN Conference 10 typeYear_district_subdistrict_entryNumber_role
  • 11. Certificate Roles Birth • baby • mother • father • registrar • informant Marriage • groom • groom_father • groom_mother • bride • bride_father • bride_mother • witness1 • witness2 • officiant • registrar Death • deceased • mother • father • spouse1…spousen • informant • doctor • registrar 1 June 2017 ADRN Conference 11
  • 12. Conclusions • Agreed identifier scheme typeYear_district_subdistrict_entryNumber_role • Meets desiderata • Reliant on “clean” parts of certificate • Compatible with NRS • Improved team communications Alasdair Gray www.macs.hw.ac.uk/~ajg33/ A.J.G.Gray@hw.ac.uk @gray_alasdair Acknowledgements: • Julia Jennings • Christine Jones • Diego Ramiro-Farinas 1 June 2017 ADRN Conference 12

Editor's Notes

  1. Outline: Background of the Digitising Scotland project and its aims Need for an identifier scheme Agreed upon scheme
  2. Large scale family reconstruction studies and Pedigrees
  3. Different backgrounds, different expertise, different terminology Have run a series of workshops to bring us together to understand our approaches and terminologies
  4. Civil registration of births, deaths and marriages in Scotland began on 1 January 1855 All historical vital events records have been converted into digital image format with a supporting index Modern vital events data (from 1974 onwards) are available electronically The DS project will digitise the 24 million Scottish vital events record images (births, marriages and deaths) since 1855. This will allow research access to individual-level information on some 18 million individuals – a large proportion of those who have lived in Scotland. Transcription outsourced. Now starting to receive data. Queens Centre for Data Digitisation and Analysis
  5. Data is of low quality Transcription errors No occupation standards, etc. Skewed name distributions (large proportion of people in a village/town have the same name) Scalability (linking 24M records)
  6. Low quality linkage due to challenges - Skewed name distributions - Need to regularly discuss within team
  7. Different teams using their own identifier schemes No relationship between identifier scheme and original record
  8. Existing approaches reliant on order of transcription CS hash-based approaches reliant on data content – ID changes if data changes
  9. Use information on registration book Registration district on book or on microfiche image Rathven, Banff has no subdistrict Need to capture the different roles