SlideShare a Scribd company logo
1 of 18
Ruben Schalk
r.schalk@uu.nl
Working with (lots of) historical persons:
LINKS/ Clariah WP4
HackaLOD
05-02-2021
Clariah WP4
• Online and (as much as possible) open access to all Dutch birth,
marriage, and death certificates from 1810
• Add relations between persons on all certificates using record linking
• Reconstitute families and family trees
• Facilitate linking with external datasets with historical person
observations
Clariah WP4 (2)
• Optimal data archiving through Linked Data: standardized sets
of variable names, new ways to estimate quality of matches,
and intuitive storage of linking quality.
• Framework to organize and store information on inequality
(and hopefully more in the future)
• Datahub for social and economic history
Importance: new research
• Multigenerational studies: social mobility, heritability.
• Add deep family relations to topics such as asset ownership
(Memories van Successie), strikes, business, mortality, fertility,
anthropometrics etc.
• Conventional research might require larger N than current
micro datasets can provide, for example longevity, birth
spacing, or sex-specific effects.
• Large geographic scope: migration and environmental effects.
Dataset(s): indexed civil registry (Burgerlijke Stand)
LINKS
- IISG project
- Data from CBG
- A2A XML > ACCESS > CSV
- Full names removed
- Data (not yet) public
Openarch
- Data from archives directly
- Available in A2A XML / CSV / TTL
- Open access!
- Not cleaned
- Potentially duplicates
- Doubles within archives
- Different archives indexing the
same certificates
Data pipeline
LINKS
- Cleaning by IISG similar to -->
- Documentation (almost)
available
- Scripts are not (yet)
- Lookup tables for occupations
and georeferencing (Amsterdam
code) available
Openarch
- Deduplication
- Adding amco’s to placenames
- Inferring sex by surnames
- Standardize ages
- Coding occupations
Data issues
- Missing a lot of occupations
(white regions)
- Ages as well
- Doubles remain an issue
- Not everything is indexed yet
- Emigration blind spot
Datasets compared (births)
Some results: excess mortality Spanish Flu (openarch)
https://stories.datalegend.net
/spanishFluNetherlands/
Record linking results: Zeeland
• Newborns > marriages • Parents of newborns > bride/grooms
RDF data model: retrieve relations
• Query from relation to relation with SPARQL
• Retrieve all associated information (locations, occupations, etc.)
Results: social mobility (LINKS Zeeland)
• Question: what was the average socio-eco occupational score of
fathers and grandfathers of newborns between 1850 and 1920?
• Occupation from birth certificates
• https://druid.datalegend.net/LINKS/-/queries/social-mobility-births/2
Birth certificate 1
- Newborn
- Father + occupation Birth certificate 2
- Father
- Grandfather + occupation
Mean Hiscam score of fathers and grandfathers
Results: social mobility 2 (LINKS Zeeland)
• Question: what was the average socio-eco occupational score of
grooms and their fathers - at their own marriage - between 1850 and
1920?
• https://druid.datalegend.net/LINKS/-/queries/social-mobility-
marriage/1
Marriage certificate 2
- Father groom + occupation
Marriage certificate 1
- Groom + occupation
- Father groom (+ occupation)
Mean Hiscam score of grooms and their fathers (at marriage)
Future plans
• Small-scale private releases in 2020/2021, public releases in 2022.
• Add income/wealth data on individual level
• Filling gaps in the civil registry: more info on wealth associated with
occupations
• Substantial challenges remain due to scale of data (e.g. hosting triples)
• Add as many interesting datasets as possible
Useful links
• Team page: http://www.datalegend.net/
• Datasets: https://druid.datalegend.net/
• Data stories: https://stories.datalegend.net/
• CSV to Linked Data conversion: https://github.com/CLARIAH/CoW/wiki
• Online SPARQL tutorial:
https://programminghistorian.org/en/lessons/intro-to-linked-data
Data: progress
• Comparing to known
birth/death totals.
• Noord-Holland (Amsterdam!)
and Zuid-Holland are the
biggest gaps in the data, but
in progress.
• Amsterdam archives
interested in completing their
civil registries (but $$$).
Birth Death
Drenthe 100.0% 114.5%
Friesland 101.9% 114.5%
Gelderland 103.9% 120.0%
Groningen 100.5% 115.3%
Limburg 105.1% 116.3%
Noord-Brabant 114.3% 149.4%
Noord-Holland 82.2% 61.8%
Overijssel 61.2% 113.5%
Utrecht 111.9% 126.5%
Zeeland 113.9% 121.9%
Zuid-Holland 74.2% 80.0%

More Related Content

What's hot

CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshopCKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshopIrina Bolychevsky
 
Tools for Data Manipulation - UKAD Open Refine Workshop
Tools for Data Manipulation - UKAD Open Refine WorkshopTools for Data Manipulation - UKAD Open Refine Workshop
Tools for Data Manipulation - UKAD Open Refine WorkshopAdrian Stevenson
 
Linking Data with sameAs: Challenges and Solutions - Workshop
Linking Data with sameAs: Challenges and Solutions - WorkshopLinking Data with sameAs: Challenges and Solutions - Workshop
Linking Data with sameAs: Challenges and Solutions - WorkshopAdrian Stevenson
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsRinke Hoekstra
 
JudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish EncyclopediaeJudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish EncyclopediaeKai Eckert
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Allen Press
 
Advanced web searching
Advanced web searchingAdvanced web searching
Advanced web searchingelisacho
 
Educon2.3 History, history
Educon2.3 History, historyEducon2.3 History, history
Educon2.3 History, historyvisiblehistory
 
2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens libraryTeresa Doherty
 
Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionEmily Nimsakont
 
2011 11 grdi-presentation
2011 11 grdi-presentation2011 11 grdi-presentation
2011 11 grdi-presentationJohannes Keizer
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and dataAndrew Treloar
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government DataRichard Cyganiak
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
Introduction to MarcEdit
Introduction to MarcEditIntroduction to MarcEdit
Introduction to MarcEditEmily Nimsakont
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and librariesAlison Hitchens
 

What's hot (20)

CKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshopCKAN intro for Estonian open data workshop
CKAN intro for Estonian open data workshop
 
Tools for Data Manipulation - UKAD Open Refine Workshop
Tools for Data Manipulation - UKAD Open Refine WorkshopTools for Data Manipulation - UKAD Open Refine Workshop
Tools for Data Manipulation - UKAD Open Refine Workshop
 
Data on the web
Data on the webData on the web
Data on the web
 
Linking Data with sameAs: Challenges and Solutions - Workshop
Linking Data with sameAs: Challenges and Solutions - WorkshopLinking Data with sameAs: Challenges and Solutions - Workshop
Linking Data with sameAs: Challenges and Solutions - Workshop
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the Netherlands
 
Csdh sbg clariah_intr01
Csdh sbg clariah_intr01Csdh sbg clariah_intr01
Csdh sbg clariah_intr01
 
JudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish EncyclopediaeJudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish Encyclopediae
 
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
Biodiversity—A Healthy Ecosystem Thrives on Fresh Ideas (Part 1 of 3), Phil J...
 
OKFN_OpenDataMx
OKFN_OpenDataMxOKFN_OpenDataMx
OKFN_OpenDataMx
 
Gatenby Vvbad 200909
Gatenby Vvbad 200909Gatenby Vvbad 200909
Gatenby Vvbad 200909
 
Advanced web searching
Advanced web searchingAdvanced web searching
Advanced web searching
 
Educon2.3 History, history
Educon2.3 History, historyEducon2.3 History, history
Educon2.3 History, history
 
2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library2011 jisc rdtf teresa the womens library
2011 jisc rdtf teresa the womens library
 
Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An Introduction
 
2011 11 grdi-presentation
2011 11 grdi-presentation2011 11 grdi-presentation
2011 11 grdi-presentation
 
Instutional repositories and data
Instutional repositories and dataInstutional repositories and data
Instutional repositories and data
 
The State of Linked Government Data
The State of Linked Government DataThe State of Linked Government Data
The State of Linked Government Data
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Introduction to MarcEdit
Introduction to MarcEditIntroduction to MarcEdit
Introduction to MarcEdit
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
 

Similar to Hack a LOD Schalk Clariah WP4

Gaenovium - Open data in the Netherlands
Gaenovium - Open data in the NetherlandsGaenovium - Open data in the Netherlands
Gaenovium - Open data in the NetherlandsBob Coret
 
AskRI, Library Directors Meeting October 2008
AskRI, Library Directors Meeting October 2008AskRI, Library Directors Meeting October 2008
AskRI, Library Directors Meeting October 2008Karen Mellor
 
20140130 metadata vocabularies_and_cultural_heritage_final
20140130 metadata vocabularies_and_cultural_heritage_final20140130 metadata vocabularies_and_cultural_heritage_final
20140130 metadata vocabularies_and_cultural_heritage_finalGerard Kuys
 
Hrd 860 web design p.point
Hrd 860 web design p.pointHrd 860 web design p.point
Hrd 860 web design p.pointimm07a
 
Brand niemann02042012
Brand niemann02042012Brand niemann02042012
Brand niemann02042012Brand Niemann
 
Open data and Free UK Genealogy
Open data and Free UK GenealogyOpen data and Free UK Genealogy
Open data and Free UK GenealogyFree UK Genealogy
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'bakers84
 
Online Genealogy Intro for Mendon NY Public Library and Historical Society
Online Genealogy Intro for Mendon NY Public Library and Historical SocietyOnline Genealogy Intro for Mendon NY Public Library and Historical Society
Online Genealogy Intro for Mendon NY Public Library and Historical SocietyLarry Naukam
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLICwebmaster
 
Presentation for Spring 2016 POWER Library Users Conference
Presentation for Spring 2016 POWER Library Users ConferencePresentation for Spring 2016 POWER Library Users Conference
Presentation for Spring 2016 POWER Library Users ConferenceLeigh-Anne Yacovelli
 
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913IRL_Project
 
Linked Data for Digital Humanities - Big Data Summerschool
Linked Data for Digital Humanities - Big Data SummerschoolLinked Data for Digital Humanities - Big Data Summerschool
Linked Data for Digital Humanities - Big Data SummerschoolVictor de Boer
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers Getaneh Alemu
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...Hazel Hall
 
Webtember
WebtemberWebtember
WebtemberNZSG
 

Similar to Hack a LOD Schalk Clariah WP4 (20)

Gaenovium - Open data in the Netherlands
Gaenovium - Open data in the NetherlandsGaenovium - Open data in the Netherlands
Gaenovium - Open data in the Netherlands
 
Addressing History
Addressing HistoryAddressing History
Addressing History
 
AddressingHistory
AddressingHistoryAddressingHistory
AddressingHistory
 
AskRI, Library Directors Meeting October 2008
AskRI, Library Directors Meeting October 2008AskRI, Library Directors Meeting October 2008
AskRI, Library Directors Meeting October 2008
 
20140130 metadata vocabularies_and_cultural_heritage_final
20140130 metadata vocabularies_and_cultural_heritage_final20140130 metadata vocabularies_and_cultural_heritage_final
20140130 metadata vocabularies_and_cultural_heritage_final
 
Sw4 sh slides
Sw4 sh slidesSw4 sh slides
Sw4 sh slides
 
Hrd 860 web design p.point
Hrd 860 web design p.pointHrd 860 web design p.point
Hrd 860 web design p.point
 
Brand niemann02042012
Brand niemann02042012Brand niemann02042012
Brand niemann02042012
 
Bills class options
Bills class optionsBills class options
Bills class options
 
Open data and Free UK Genealogy
Open data and Free UK GenealogyOpen data and Free UK Genealogy
Open data and Free UK Genealogy
 
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
Finding 'My Tree' Within FamilySearch Family Tree's 'Our Tree'
 
Online Genealogy Intro for Mendon NY Public Library and Historical Society
Online Genealogy Intro for Mendon NY Public Library and Historical SocietyOnline Genealogy Intro for Mendon NY Public Library and Historical Society
Online Genealogy Intro for Mendon NY Public Library and Historical Society
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
 
Presentation for Spring 2016 POWER Library Users Conference
Presentation for Spring 2016 POWER Library Users ConferencePresentation for Spring 2016 POWER Library Users Conference
Presentation for Spring 2016 POWER Library Users Conference
 
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
Reusing Legacy data: Irish Historic Vital Registration Data, 1864-1913
 
Linked Data for Digital Humanities - Big Data Summerschool
Linked Data for Digital Humanities - Big Data SummerschoolLinked Data for Digital Humanities - Big Data Summerschool
Linked Data for Digital Humanities - Big Data Summerschool
 
Metadata for researchers
Metadata for researchers Metadata for researchers
Metadata for researchers
 
Research into Practice case study 2: Library linked data implementations an...
	Research into Practice case study 2:  Library linked data implementations an...	Research into Practice case study 2:  Library linked data implementations an...
Research into Practice case study 2: Library linked data implementations an...
 
The Blossoming of the Semantic Web
The Blossoming of the Semantic WebThe Blossoming of the Semantic Web
The Blossoming of the Semantic Web
 
Webtember
WebtemberWebtember
Webtember
 

Recently uploaded

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Recently uploaded (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 

Hack a LOD Schalk Clariah WP4

  • 1. Ruben Schalk r.schalk@uu.nl Working with (lots of) historical persons: LINKS/ Clariah WP4 HackaLOD 05-02-2021
  • 2. Clariah WP4 • Online and (as much as possible) open access to all Dutch birth, marriage, and death certificates from 1810 • Add relations between persons on all certificates using record linking • Reconstitute families and family trees • Facilitate linking with external datasets with historical person observations
  • 3. Clariah WP4 (2) • Optimal data archiving through Linked Data: standardized sets of variable names, new ways to estimate quality of matches, and intuitive storage of linking quality. • Framework to organize and store information on inequality (and hopefully more in the future) • Datahub for social and economic history
  • 4. Importance: new research • Multigenerational studies: social mobility, heritability. • Add deep family relations to topics such as asset ownership (Memories van Successie), strikes, business, mortality, fertility, anthropometrics etc. • Conventional research might require larger N than current micro datasets can provide, for example longevity, birth spacing, or sex-specific effects. • Large geographic scope: migration and environmental effects.
  • 5. Dataset(s): indexed civil registry (Burgerlijke Stand) LINKS - IISG project - Data from CBG - A2A XML > ACCESS > CSV - Full names removed - Data (not yet) public Openarch - Data from archives directly - Available in A2A XML / CSV / TTL - Open access! - Not cleaned - Potentially duplicates - Doubles within archives - Different archives indexing the same certificates
  • 6. Data pipeline LINKS - Cleaning by IISG similar to --> - Documentation (almost) available - Scripts are not (yet) - Lookup tables for occupations and georeferencing (Amsterdam code) available Openarch - Deduplication - Adding amco’s to placenames - Inferring sex by surnames - Standardize ages - Coding occupations
  • 7. Data issues - Missing a lot of occupations (white regions) - Ages as well - Doubles remain an issue - Not everything is indexed yet - Emigration blind spot
  • 9. Some results: excess mortality Spanish Flu (openarch) https://stories.datalegend.net /spanishFluNetherlands/
  • 10. Record linking results: Zeeland • Newborns > marriages • Parents of newborns > bride/grooms
  • 11. RDF data model: retrieve relations • Query from relation to relation with SPARQL • Retrieve all associated information (locations, occupations, etc.)
  • 12. Results: social mobility (LINKS Zeeland) • Question: what was the average socio-eco occupational score of fathers and grandfathers of newborns between 1850 and 1920? • Occupation from birth certificates • https://druid.datalegend.net/LINKS/-/queries/social-mobility-births/2 Birth certificate 1 - Newborn - Father + occupation Birth certificate 2 - Father - Grandfather + occupation
  • 13. Mean Hiscam score of fathers and grandfathers
  • 14. Results: social mobility 2 (LINKS Zeeland) • Question: what was the average socio-eco occupational score of grooms and their fathers - at their own marriage - between 1850 and 1920? • https://druid.datalegend.net/LINKS/-/queries/social-mobility- marriage/1 Marriage certificate 2 - Father groom + occupation Marriage certificate 1 - Groom + occupation - Father groom (+ occupation)
  • 15. Mean Hiscam score of grooms and their fathers (at marriage)
  • 16. Future plans • Small-scale private releases in 2020/2021, public releases in 2022. • Add income/wealth data on individual level • Filling gaps in the civil registry: more info on wealth associated with occupations • Substantial challenges remain due to scale of data (e.g. hosting triples) • Add as many interesting datasets as possible
  • 17. Useful links • Team page: http://www.datalegend.net/ • Datasets: https://druid.datalegend.net/ • Data stories: https://stories.datalegend.net/ • CSV to Linked Data conversion: https://github.com/CLARIAH/CoW/wiki • Online SPARQL tutorial: https://programminghistorian.org/en/lessons/intro-to-linked-data
  • 18. Data: progress • Comparing to known birth/death totals. • Noord-Holland (Amsterdam!) and Zuid-Holland are the biggest gaps in the data, but in progress. • Amsterdam archives interested in completing their civil registries (but $$$). Birth Death Drenthe 100.0% 114.5% Friesland 101.9% 114.5% Gelderland 103.9% 120.0% Groningen 100.5% 115.3% Limburg 105.1% 116.3% Noord-Brabant 114.3% 149.4% Noord-Holland 82.2% 61.8% Overijssel 61.2% 113.5% Utrecht 111.9% 126.5% Zeeland 113.9% 121.9% Zuid-Holland 74.2% 80.0%