2. Clariah WP4
• Online and (as much as possible) open access to all Dutch birth,
marriage, and death certificates from 1810
• Add relations between persons on all certificates using record linking
• Reconstitute families and family trees
• Facilitate linking with external datasets with historical person
observations
3. Clariah WP4 (2)
• Optimal data archiving through Linked Data: standardized sets
of variable names, new ways to estimate quality of matches,
and intuitive storage of linking quality.
• Framework to organize and store information on inequality
(and hopefully more in the future)
• Datahub for social and economic history
4. Importance: new research
• Multigenerational studies: social mobility, heritability.
• Add deep family relations to topics such as asset ownership
(Memories van Successie), strikes, business, mortality, fertility,
anthropometrics etc.
• Conventional research might require larger N than current
micro datasets can provide, for example longevity, birth
spacing, or sex-specific effects.
• Large geographic scope: migration and environmental effects.
5. Dataset(s): indexed civil registry (Burgerlijke Stand)
LINKS
- IISG project
- Data from CBG
- A2A XML > ACCESS > CSV
- Full names removed
- Data (not yet) public
Openarch
- Data from archives directly
- Available in A2A XML / CSV / TTL
- Open access!
- Not cleaned
- Potentially duplicates
- Doubles within archives
- Different archives indexing the
same certificates
6. Data pipeline
LINKS
- Cleaning by IISG similar to -->
- Documentation (almost)
available
- Scripts are not (yet)
- Lookup tables for occupations
and georeferencing (Amsterdam
code) available
Openarch
- Deduplication
- Adding amco’s to placenames
- Inferring sex by surnames
- Standardize ages
- Coding occupations
7. Data issues
- Missing a lot of occupations
(white regions)
- Ages as well
- Doubles remain an issue
- Not everything is indexed yet
- Emigration blind spot
11. RDF data model: retrieve relations
• Query from relation to relation with SPARQL
• Retrieve all associated information (locations, occupations, etc.)
12. Results: social mobility (LINKS Zeeland)
• Question: what was the average socio-eco occupational score of
fathers and grandfathers of newborns between 1850 and 1920?
• Occupation from birth certificates
• https://druid.datalegend.net/LINKS/-/queries/social-mobility-births/2
Birth certificate 1
- Newborn
- Father + occupation Birth certificate 2
- Father
- Grandfather + occupation
14. Results: social mobility 2 (LINKS Zeeland)
• Question: what was the average socio-eco occupational score of
grooms and their fathers - at their own marriage - between 1850 and
1920?
• https://druid.datalegend.net/LINKS/-/queries/social-mobility-
marriage/1
Marriage certificate 2
- Father groom + occupation
Marriage certificate 1
- Groom + occupation
- Father groom (+ occupation)
16. Future plans
• Small-scale private releases in 2020/2021, public releases in 2022.
• Add income/wealth data on individual level
• Filling gaps in the civil registry: more info on wealth associated with
occupations
• Substantial challenges remain due to scale of data (e.g. hosting triples)
• Add as many interesting datasets as possible
17. Useful links
• Team page: http://www.datalegend.net/
• Datasets: https://druid.datalegend.net/
• Data stories: https://stories.datalegend.net/
• CSV to Linked Data conversion: https://github.com/CLARIAH/CoW/wiki
• Online SPARQL tutorial:
https://programminghistorian.org/en/lessons/intro-to-linked-data
18. Data: progress
• Comparing to known
birth/death totals.
• Noord-Holland (Amsterdam!)
and Zuid-Holland are the
biggest gaps in the data, but
in progress.
• Amsterdam archives
interested in completing their
civil registries (but $$$).
Birth Death
Drenthe 100.0% 114.5%
Friesland 101.9% 114.5%
Gelderland 103.9% 120.0%
Groningen 100.5% 115.3%
Limburg 105.1% 116.3%
Noord-Brabant 114.3% 149.4%
Noord-Holland 82.2% 61.8%
Overijssel 61.2% 113.5%
Utrecht 111.9% 126.5%
Zeeland 113.9% 121.9%
Zuid-Holland 74.2% 80.0%