SlideShare a Scribd company logo
1 of 40
Download to read offline
Spatial Aspects of Collaborative
Demography via Genealogical Data
Arthur Charpentier
(CREM UMR CNRS 6211, Université de Rennes 1 & Actinfo Chair)
Ewen Gallic
(CREM UMR CNRS 6211, Université de Rennes 1 & Actinfo Chair)
GEOMED 2017
i3S, Porto, 7–9 September 2017
Introduction Data preparation Preliminary results Conclusion
Historical demography
What is historical demography?
the description, analysis and understanding of population, in
the past
using quantitative methods
Which use for historical demography? (Dupâquier, 1981)
Biology: understanding the structures of families and house-
holds
sociology: studying fertility, birth and death rates; disentan-
gling between hereditary and environmental characteristics
Demography: rebuilding the population,
Economics: studying population migration
A. Charpentier & E. Gallic Données généalogiques - 2/22
Introduction Data preparation Preliminary results Conclusion
Historical demography: a vast literature
A pioneer analysis of historical demography: Henry (1956)
Followed by a lot of articles exploiting longitudinal data, e.g.,:
Matthijs and Moreels (2010) (COR∗ database)
Antwerp, Belgium, 1846–1920, ≈ 57k obs.
Mandemakers (2000)
The Netherlands, 1812–1922, ≈ 77k obs.
Bouchard et al. (1989) (BALSAC)
Québec, Canada, since 17th
century, ≈ 2M events, ≈ 575k
individuals
Bean et al. (1978)
mainly Utah, USA, since 18th
century, ≈ 1.2M individuals
A. Charpentier & E. Gallic Données généalogiques - 3/22
Introduction Data preparation Preliminary results Conclusion
Historical demography: limits
These studies face some issues:
often limited to a small sample of individuals, in particular
geographic areas (possible bias)
gathering information from available sources is expensive and
time consuming
A. Charpentier & E. Gallic Données généalogiques - 4/22
Introduction Data preparation Preliminary results Conclusion
Historical demography and Big Data:
collaborative data
New perspective are emerging with the big data era
Collaborative genealogy data might overcome some issues:
less costly for researchers
may cover wider geographical areas
no need for sampling
Question: Can we use these data to study population?
A. Charpentier & E. Gallic Données généalogiques - 5/22
Introduction Data preparation Preliminary results Conclusion
Big Data and Genealogy: (short) literature
A promising new strand in the literature uses big data to study:
lifespan: Fire and Elovici (2015) with WikiTree.com online
data (+1M rows)
exceptional longevity: Gavrilova and Gavrilov (2007) with
online genealogy data (+75M deceased individuals)
Cummins (2017) with family trees from FamilySearch.org
(402, 204 rows)
Kaplanis et al. (2017) with family trees from Geni.com (86M
profiles)
A. Charpentier & E. Gallic Données généalogiques - 6/22
Introduction Data preparation Preliminary results Conclusion
Big Data and Genealogy: Issues
Some issues remain:
possible representativeness bias
unknown quality of records
heterogeneity in information collected
A. Charpentier & E. Gallic Données généalogiques - 7/22
Introduction Data preparation Preliminary results Conclusion
Outline
1 Introduction
2 Data preparation
3 Preliminary results
4 Concluding remarks
A. Charpentier & E. Gallic Données généalogiques - 8/22
Introduction Data preparation Preliminary results Conclusion
Geneanet
http://www.geneanet.org/
Since 1996
First European genealogy website:
+2 million members
+4 billion individuals
Geographic distribution:
40% in France
30% in the rest of Europe
25% in the US
5% in the rest of the world
A. Charpentier & E. Gallic Données généalogiques - 9/22
Introduction Data preparation Preliminary results Conclusion
Raw data
Here, we focus on:
people born between 1800 and 1804, in "Maine-et-Loire"
(France)
and their offspring
Number of rows: 100, 081
A. Charpentier & E. Gallic Données généalogiques - 10/22
Introduction Data preparation Preliminary results Conclusion
Raw data
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
4 N Gennes, 49350 47.34083 -0.23278 99 59
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 11/22
Introduction Data preparation Preliminary results Conclusion
Raw data
Row: event(s) for an individual (Birth, Marriage, Death)
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
4 N Gennes, 49350 47.34083 -0.23278 99 59
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 11/22
Introduction Data preparation Preliminary results Conclusion
Raw data
Row: event(s) for an individual (Birth, Marriage, Death)
date of event
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
4 N Gennes, 49350 47.34083 -0.23278 99 59
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 11/22
Introduction Data preparation Preliminary results Conclusion
Raw data
Row: event(s) for an individual (Birth, Marriage, Death)
date of event
place of event (name, latitude, longitude)
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
4 N Gennes, 49350 47.34083 -0.23278 99 59
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 11/22
Introduction Data preparation Preliminary results Conclusion
Raw data
Individuals identified by (ID_user, ID_ns)
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
4 N Gennes, 49350 47.34083 -0.23278 99 59
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 11/22
Introduction Data preparation Preliminary results Conclusion
Raw data
Possible to link parents
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
4 N Gennes, 49350 47.34083 -0.23278 99 59
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 11/22
Introduction Data preparation Preliminary results Conclusion
Individuals
Individuals might appear multiple times in the raw data
A. Charpentier & E. Gallic Données généalogiques - 12/22
Introduction Data preparation Preliminary results Conclusion
Individuals
Individuals might appear multiple times in the raw data
merged in successive steps
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 12/22
Introduction Data preparation Preliminary results Conclusion
Individuals
Individuals might appear multiple times in the raw data
merged in successive steps
close names (e.g., Jean or Jehan) accounted for using a string
distance metric
ID_user ID_ns ID_num Name Surname Sex date_b date_m
1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121
2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110
3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121
5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027
date_d type place Lat Long ID_num_m ID_num_f
1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574
2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620
3 18560000 NM Longué, 49180 47.37806 -0.10806
5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063
A. Charpentier & E. Gallic Données généalogiques - 12/22
Introduction Data preparation Preliminary results Conclusion
Individuals
Individuals might appear multiple times in the raw data
merged in successive steps
close names (e.g., Jean or Jehan) accounted for using a string
distance metric
We started with 100, 081 rows and finally obtained 58, 541
individuals in Maine-et-Loire
ID ID_m ID_f Name Surname Sex Date_b Long_b Lat_b
16194 NA NA besnard jean 1 18000226 -0.11 47.38
73508 NA NA besnard jean 1 18000228 -0.23 47.34
70834 NA NA besnard jean 1 18000307 0.02 47.47
13915 NA NA besnard pierre 1 18000418 -0.15 47.31
2128 NA NA besnard jeanne 2 18000530 -0.83 47.27
ID Date_m Long_m Lat_m Date_d Long_d Lat_d
16194 18280121 -0.10806 47.37806 18711003 -0.11 47.38
73508 NA NA NA 18130131 -0.23 47.34
70834 18330206 0.01694 47.46667 18560107 0.02 47.47
13915 18270703 -0.15389 47.30833 NA NA NA
2128 18280708 -0.82583 47.26694 18850820 -0.83 47.27
A. Charpentier & E. Gallic Données généalogiques - 12/22
Introduction Data preparation Preliminary results Conclusion
Genealogy
Using the parents’ IDs, it is possible to construct family trees
A. Charpentier & E. Gallic Données généalogiques - 13/22
Introduction Data preparation Preliminary results Conclusion
Family tree
We start with an individual: François Menard, born in 1801-
01-24 in Maine-et-Loire
François
Menard
1801-01-24
(Feneu,
49460)
A. Charpentier & E. Gallic Données généalogiques - 14/22
Introduction Data preparation Preliminary results Conclusion
Family tree
Then we look up for individuals with François Menard born
in 1801-01-24 as "father"
François
Menard
1801-01-24
(Feneu,
49460)
François
Menard1
1843-12-12
(Feneu,
49460)
Renée
Menard
1837-09-29
(Feneu,
49460)
Julie Menard
1835-11-04
(Feneu,
49460)
A. Charpentier & E. Gallic Données généalogiques - 14/22
Introduction Data preparation Preliminary results Conclusion
Family tree
And then retrieve these individuals’ mother(s)
François
Menard
1801-01-24
(Feneu,
49460)
Julie Poulard
1802-05-31
(Feneu,
49460)
François
Menard1
1843-12-12
(Feneu,
49460)
Renée
Menard
1837-09-29
(Feneu,
49460)
Julie Menard
1835-11-04
(Feneu,
49460)
A. Charpentier & E. Gallic Données généalogiques - 14/22
Introduction Data preparation Preliminary results Conclusion
Family tree
We then look up for the grandchildren
François
Menard
1801-01-24
(Feneu,
49460)
Julie Poulard
1802-05-31
(Feneu,
49460)
François
Menard1
1843-12-12
(Feneu,
49460)
Renée
Menard
1837-09-29
(Feneu,
49460)
Marie Hery
1874-07-27
(Feneu,
49460)
Louise Hery
1873-04-28
(Feneu,
49460)
Julie Menard
1835-11-04
(Feneu,
49460)
François Jolie
1874-06-19
(Feneu,
49460)
Jean Jolie
18700915
(Feneu,
49460)
Joseph Jolie
1868-09-18
(Feneu,
49460)
Pierre Jolie
1866-04-07
(Feneu,
49460)
Julie Jolie
1864-07-05
(Feneu,
49460)
A. Charpentier & E. Gallic Données généalogiques - 14/22
Introduction Data preparation Preliminary results Conclusion
Family tree
And complete with the second parent
François
Menard
1801-01-24
(Feneu,
49460)
Julie Poulard
1802-05-31
(Feneu,
49460)
François
Menard1
1843-12-12
(Feneu,
49460)
?
?
Renée
Menard
1837-09-29
(Feneu,
49460)
Marie Hery
1874-07-27
(Feneu,
49460)
Louise Hery
1873-04-28
(Feneu,
49460)
Pierre Jolie
1837-06-19
(Soulaire-
et-Bourg,
49460)
Julie Menard
1835-11-04
(Feneu,
49460)
François Jolie
1874-06-19
(Feneu,
49460)
Jean Jolie
18700915
(Feneu,
49460)
Joseph Jolie
1868-09-18
(Feneu,
49460)
Pierre Jolie
1866-04-07
(Feneu,
49460)
Julie Jolie
1864-07-05
(Feneu,
49460)
A. Charpentier & E. Gallic Données généalogiques - 14/22
Introduction Data preparation Preliminary results Conclusion
Migration
With these family trees, we are able to study migration of
people from generation to generation
Using the information given on the place of the events of their
life
A. Charpentier & E. Gallic Données généalogiques - 15/22
Introduction Data preparation Preliminary results Conclusion
Individuals from Maine-et-Loire
Table 1: Number of observations in each generation
Generation
No
Obs.
Available
Coord. Birth
(%)
Available
Coord.
Marriage (%)
Available
Coord. Death
(%)
0 25,421 99.74 27.78 29.17
1 17,237 97.40 48.20 39.06
2 15,071 98.08 49.19 37.11
A. Charpentier & E. Gallic Données généalogiques - 16/22
Introduction Data preparation Preliminary results Conclusion
Migration
Are children born next to the birthplace of their parents?
We compute the distance between the places of birth of a
child and his or her parents, and keep the closest (if available)
Table 2: Distances between children and their parents (in km)
Mother Father
Generation
Mean
Dist.
Min
Dist.
Max
Dist.
Mean
Dist.
Min
Dist.
Max
Dist.
1 5.86 0 8,091.12 5.16 0 767.85
2 13.67 0 3,334.21 19.83 0 11,850.38
A. Charpentier & E. Gallic Données généalogiques - 17/22
Introduction Data preparation Preliminary results Conclusion
Migration
Are children born next to the birthplace of their parents?
We compute the distance between the places of birth of a
child and his or her parents, and keep the closest (if available)
Table 2: Distances between children and their parents (in km)
Mother Father
Generation
Mean
Dist.
Min
Dist.
Max
Dist.
Mean
Dist.
Min
Dist.
Max
Dist.
1 5.86 0 8,091.12 5.16 0 767.85
2 13.67 0 3,334.21 19.83 0 11,850.38
A. Charpentier & E. Gallic Données généalogiques - 17/22
Introduction Data preparation Preliminary results Conclusion
Migration
We then construct three dummy variables:
<10km : TRUE if the closest distance is lower than 10km
<50km : TRUE if the closest distance is lower than 50km
<100km : TRUE if the closest distance is lower than 100km
Table 3: Distances between children and their parents
Gen.
Nb not
missing
dis-
tance
<10km
<10km
(%)
<50km
<50km
(%)
<100km
<100km
(%)
Nb NAs Total
1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237
2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071
A. Charpentier & E. Gallic Données généalogiques - 18/22
Introduction Data preparation Preliminary results Conclusion
Migration
We then construct three dummy variables:
<10km : TRUE if the closest distance is lower than 10km
<50km : TRUE if the closest distance is lower than 50km
<100km : TRUE if the closest distance is lower than 100km
Table 3: Distances between children and their parents
Gen.
Nb not
missing
dis-
tance
<10km
<10km
(%)
<50km
<50km
(%)
<100km
<100km
(%)
Nb NAs Total
1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237
2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071
A. Charpentier & E. Gallic Données généalogiques - 18/22
Introduction Data preparation Preliminary results Conclusion
Migration
We then construct three dummy variables:
<10km : TRUE if the closest distance is lower than 10km
<50km : TRUE if the closest distance is lower than 50km
<100km : TRUE if the closest distance is lower than 100km
Table 3: Distances between children and their parents
Gen.
Nb not
missing
dis-
tance
<10km
<10km
(%)
<50km
<50km
(%)
<100km
<100km
(%)
Nb NAs Total
1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237
2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071
A. Charpentier & E. Gallic Données généalogiques - 18/22
Introduction Data preparation Preliminary results Conclusion
Migration
We then construct three dummy variables:
<10km : TRUE if the closest distance is lower than 10km
<50km : TRUE if the closest distance is lower than 50km
<100km : TRUE if the closest distance is lower than 100km
Table 3: Distances between children and their parents
Gen.
Nb not
missing
dis-
tance
<10km
<10km
(%)
<50km
<50km
(%)
<100km
<100km
(%)
Nb NAs Total
1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237
2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071
A. Charpentier & E. Gallic Données généalogiques - 18/22
Introduction Data preparation Preliminary results Conclusion
Migration: from birth to death
We can also look at the spatial movements between birth and
death for an individual
Table 4: Distances between places of birth and death
Gen
Nb not
missing
dis-
tance
<10km
<10km
(%)
<50km
<50km
(%)
<100km
<100km
(%)
Nb NAs Total
0 7,414 7,384 99.60 7,404 99.87 7,410 99.95 18,007 25,421
1 6,547 5,396 82.42 6,213 94.90 6,310 96.38 10,690 17,237
2 5,480 4,063 74.14 4,961 90.53 5,124 93.50 9,591 15,071
A. Charpentier & E. Gallic Données généalogiques - 20/22
Introduction Data preparation Preliminary results Conclusion
What comes next?
Identify a path in space during an individual’s lifespan
birth
marriage(s)
births of children
death
Enlarge the dataset (only people born between 1800 and 1875
here)
Consider more regions
A. Charpentier & E. Gallic Données généalogiques - 22/22
References
Bibliography I
Bean, L. L., May, D. L., and Skolnick, M. (1978). The mormon historical demography
project. Historical Methods: A Journal of Quantitative and Interdisciplinary History,
11(1):45–53. doi:10.1080/01615440.1978.9955216.
Bouchard, G., Roy, R., Casgrain, B., and Hubert, M. (1989). Fichier de population et
structures de gestion de base de données : le fichier-réseau BALSAC et le système
INGRES/INGRID. Histoire & Mesure, 4(1):39–57. doi:10.3406/hism.1989.874.
Cummins, N. (2017). Lifespans of the european elite, 800–1800. The Journal of
Economic History, 77(02):406–439. doi:10.1017/s0022050717000468.
Dupâquier, J. (1981). Une grande enquête sur la mobilité géographique et sociale aux
xixe et xxe siècles. Population, 36(6):1164–1167. doi:10.2307/1532329.
Fire, M. and Elovici, Y. (2015). Data mining of online genealogy datasets for revealing
lifespan patterns in human population. ACM Transactions on Intelligent Systems
and Technology, 6(2):1–22. doi:10.1145/2700464.
Gavrilova, N. S. and Gavrilov, L. A. (2007). Search for predictors of ex-
ceptional human longevity. North American Actuarial Journal, 11(1):49–67.
doi:10.1080/10920277.2007.10597437.
Henry, L. (1956). Anciennes familles genevoises. etude démographique: XVIme - XXme
siècle. Population, 11(2):334. doi:10.2307/1524668.
A. Charpentier & E. Gallic Données généalogiques - 23/22
References
Bibliography II
Kaplanis, J., Gordon, A., Wahl, M., Gershovits, M., Markus, B., Sheikh, M., Gymrek,
M., Bhatia, G., MacArthur, D. G., Price, A., and Erlich, Y. (2017). Quantitative anal-
ysis of population-scale family trees using millions of relatives. doi:10.1101/106427.
Mandemakers, K. (2000). Historical sample of the netherlands. Handbook of interna-
tional historical microdata for population research, pages 149–177.
Matthijs, K. and Moreels, S. (2010). The antwerp cor*-database: A unique flemish
source for historical-demographic research. The History of the Family, 15(1):109–
115. doi:10.1016/j.hisfam.2010.01.002.
A. Charpentier & E. Gallic Données généalogiques - 24/22

More Related Content

More from Arthur Charpentier (20)

Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & Insurance
 
Reinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and FinanceReinforcement Learning in Economics and Finance
Reinforcement Learning in Economics and Finance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Side 2019 #10
Side 2019 #10Side 2019 #10
Side 2019 #10
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Side 2019 #6
Side 2019 #6Side 2019 #6
Side 2019 #6
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
 

Recently uploaded

The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...AES International
 
2024 Q1 Crypto Industry Report | CoinGecko
2024 Q1 Crypto Industry Report | CoinGecko2024 Q1 Crypto Industry Report | CoinGecko
2024 Q1 Crypto Industry Report | CoinGeckoCoinGecko
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...Amil baba
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)ECTIJ
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economiccinemoviesu
 
Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfMichael Silva
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Sonam Pathan
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...
NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...
NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...Amil baba
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technologyz xss
 
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》rnrncn29
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Commonwealth
 
Role of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxRole of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxNarayaniTripathi2
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Devarsh Vakil
 
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...Amil baba
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfMichael Silva
 

Recently uploaded (20)

The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...
 
2024 Q1 Crypto Industry Report | CoinGecko
2024 Q1 Crypto Industry Report | CoinGecko2024 Q1 Crypto Industry Report | CoinGecko
2024 Q1 Crypto Industry Report | CoinGecko
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
 
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)Economics, Commerce and Trade Management: An International Journal (ECTIJ)
Economics, Commerce and Trade Management: An International Journal (ECTIJ)
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economic
 
Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
Uae-NO1 Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Stock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdfStock Market Brief Deck FOR 4/17 video.pdf
Stock Market Brief Deck FOR 4/17 video.pdf
 
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
Call Girls Near Delhi Pride Hotel, New Delhi|9873777170
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...
NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...
NO1 Certified Best Amil In Rawalpindi Bangali Baba In Rawalpindi jadu tona ka...
 
212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology212MTAMount Durham University Bachelor's Diploma in Technology
212MTAMount Durham University Bachelor's Diploma in Technology
 
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
《加拿大本地办假证-寻找办理Dalhousie毕业证和达尔豪斯大学毕业证书的中介代理》
 
Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]Economic Risk Factor Update: April 2024 [SlideShare]
Economic Risk Factor Update: April 2024 [SlideShare]
 
Role of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptxRole of Information and technology in banking and finance .pptx
Role of Information and technology in banking and finance .pptx
 
Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024Market Morning Updates for 16th April 2024
Market Morning Updates for 16th April 2024
 
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
NO1 Certified kala jadu karne wale ka contact number kala jadu karne wale bab...
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdf
 

Slides cg geneanet_geomed(1)

  • 1. Spatial Aspects of Collaborative Demography via Genealogical Data Arthur Charpentier (CREM UMR CNRS 6211, Université de Rennes 1 & Actinfo Chair) Ewen Gallic (CREM UMR CNRS 6211, Université de Rennes 1 & Actinfo Chair) GEOMED 2017 i3S, Porto, 7–9 September 2017
  • 2. Introduction Data preparation Preliminary results Conclusion Historical demography What is historical demography? the description, analysis and understanding of population, in the past using quantitative methods Which use for historical demography? (Dupâquier, 1981) Biology: understanding the structures of families and house- holds sociology: studying fertility, birth and death rates; disentan- gling between hereditary and environmental characteristics Demography: rebuilding the population, Economics: studying population migration A. Charpentier & E. Gallic Données généalogiques - 2/22
  • 3. Introduction Data preparation Preliminary results Conclusion Historical demography: a vast literature A pioneer analysis of historical demography: Henry (1956) Followed by a lot of articles exploiting longitudinal data, e.g.,: Matthijs and Moreels (2010) (COR∗ database) Antwerp, Belgium, 1846–1920, ≈ 57k obs. Mandemakers (2000) The Netherlands, 1812–1922, ≈ 77k obs. Bouchard et al. (1989) (BALSAC) Québec, Canada, since 17th century, ≈ 2M events, ≈ 575k individuals Bean et al. (1978) mainly Utah, USA, since 18th century, ≈ 1.2M individuals A. Charpentier & E. Gallic Données généalogiques - 3/22
  • 4. Introduction Data preparation Preliminary results Conclusion Historical demography: limits These studies face some issues: often limited to a small sample of individuals, in particular geographic areas (possible bias) gathering information from available sources is expensive and time consuming A. Charpentier & E. Gallic Données généalogiques - 4/22
  • 5. Introduction Data preparation Preliminary results Conclusion Historical demography and Big Data: collaborative data New perspective are emerging with the big data era Collaborative genealogy data might overcome some issues: less costly for researchers may cover wider geographical areas no need for sampling Question: Can we use these data to study population? A. Charpentier & E. Gallic Données généalogiques - 5/22
  • 6. Introduction Data preparation Preliminary results Conclusion Big Data and Genealogy: (short) literature A promising new strand in the literature uses big data to study: lifespan: Fire and Elovici (2015) with WikiTree.com online data (+1M rows) exceptional longevity: Gavrilova and Gavrilov (2007) with online genealogy data (+75M deceased individuals) Cummins (2017) with family trees from FamilySearch.org (402, 204 rows) Kaplanis et al. (2017) with family trees from Geni.com (86M profiles) A. Charpentier & E. Gallic Données généalogiques - 6/22
  • 7. Introduction Data preparation Preliminary results Conclusion Big Data and Genealogy: Issues Some issues remain: possible representativeness bias unknown quality of records heterogeneity in information collected A. Charpentier & E. Gallic Données généalogiques - 7/22
  • 8. Introduction Data preparation Preliminary results Conclusion Outline 1 Introduction 2 Data preparation 3 Preliminary results 4 Concluding remarks A. Charpentier & E. Gallic Données généalogiques - 8/22
  • 9. Introduction Data preparation Preliminary results Conclusion Geneanet http://www.geneanet.org/ Since 1996 First European genealogy website: +2 million members +4 billion individuals Geographic distribution: 40% in France 30% in the rest of Europe 25% in the US 5% in the rest of the world A. Charpentier & E. Gallic Données généalogiques - 9/22
  • 10. Introduction Data preparation Preliminary results Conclusion Raw data Here, we focus on: people born between 1800 and 1804, in "Maine-et-Loire" (France) and their offspring Number of rows: 100, 081 A. Charpentier & E. Gallic Données généalogiques - 10/22
  • 11. Introduction Data preparation Preliminary results Conclusion Raw data ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 4 N Gennes, 49350 47.34083 -0.23278 99 59 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 11/22
  • 12. Introduction Data preparation Preliminary results Conclusion Raw data Row: event(s) for an individual (Birth, Marriage, Death) ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 4 N Gennes, 49350 47.34083 -0.23278 99 59 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 11/22
  • 13. Introduction Data preparation Preliminary results Conclusion Raw data Row: event(s) for an individual (Birth, Marriage, Death) date of event ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 4 N Gennes, 49350 47.34083 -0.23278 99 59 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 11/22
  • 14. Introduction Data preparation Preliminary results Conclusion Raw data Row: event(s) for an individual (Birth, Marriage, Death) date of event place of event (name, latitude, longitude) ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 4 N Gennes, 49350 47.34083 -0.23278 99 59 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 11/22
  • 15. Introduction Data preparation Preliminary results Conclusion Raw data Individuals identified by (ID_user, ID_ns) ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 4 N Gennes, 49350 47.34083 -0.23278 99 59 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 11/22
  • 16. Introduction Data preparation Preliminary results Conclusion Raw data Possible to link parents ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 4 dutheilfr besnard|pierre| 729 BESNARD Pierre 1 18001221 18270703 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 4 N Gennes, 49350 47.34083 -0.23278 99 59 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 11/22
  • 17. Introduction Data preparation Preliminary results Conclusion Individuals Individuals might appear multiple times in the raw data A. Charpentier & E. Gallic Données généalogiques - 12/22
  • 18. Introduction Data preparation Preliminary results Conclusion Individuals Individuals might appear multiple times in the raw data merged in successive steps ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 12/22
  • 19. Introduction Data preparation Preliminary results Conclusion Individuals Individuals might appear multiple times in the raw data merged in successive steps close names (e.g., Jean or Jehan) accounted for using a string distance metric ID_user ID_ns ID_num Name Surname Sex date_b date_m 1 daage besnard|jean|1 575 BESNARD Jean 1 18000227 18280121 2 denisgallienne besnard|louis|1 22771 BESNARD Louis 1 18040603 18251110 3 domiassi besnard|jean| 1748 BESNARD Jean 1 18000227 18280121 5 dvivier1 besnard|louis|1 65196 BESNARD Louis 1 18001215 18291027 date_d type place Lat Long ID_num_m ID_num_f 1 16810000 NM Longué, 0180 47.37806 -0.10806 4457 574 2 18831027 ND Cunault, 49350 47.30833 -0.15389 994 1620 3 18560000 NM Longué, 49180 47.37806 -0.10806 5 18490717 N Pommeraye, 49244 47.35528 -0.86028 43116 4063 A. Charpentier & E. Gallic Données généalogiques - 12/22
  • 20. Introduction Data preparation Preliminary results Conclusion Individuals Individuals might appear multiple times in the raw data merged in successive steps close names (e.g., Jean or Jehan) accounted for using a string distance metric We started with 100, 081 rows and finally obtained 58, 541 individuals in Maine-et-Loire ID ID_m ID_f Name Surname Sex Date_b Long_b Lat_b 16194 NA NA besnard jean 1 18000226 -0.11 47.38 73508 NA NA besnard jean 1 18000228 -0.23 47.34 70834 NA NA besnard jean 1 18000307 0.02 47.47 13915 NA NA besnard pierre 1 18000418 -0.15 47.31 2128 NA NA besnard jeanne 2 18000530 -0.83 47.27 ID Date_m Long_m Lat_m Date_d Long_d Lat_d 16194 18280121 -0.10806 47.37806 18711003 -0.11 47.38 73508 NA NA NA 18130131 -0.23 47.34 70834 18330206 0.01694 47.46667 18560107 0.02 47.47 13915 18270703 -0.15389 47.30833 NA NA NA 2128 18280708 -0.82583 47.26694 18850820 -0.83 47.27 A. Charpentier & E. Gallic Données généalogiques - 12/22
  • 21. Introduction Data preparation Preliminary results Conclusion Genealogy Using the parents’ IDs, it is possible to construct family trees A. Charpentier & E. Gallic Données généalogiques - 13/22
  • 22. Introduction Data preparation Preliminary results Conclusion Family tree We start with an individual: François Menard, born in 1801- 01-24 in Maine-et-Loire François Menard 1801-01-24 (Feneu, 49460) A. Charpentier & E. Gallic Données généalogiques - 14/22
  • 23. Introduction Data preparation Preliminary results Conclusion Family tree Then we look up for individuals with François Menard born in 1801-01-24 as "father" François Menard 1801-01-24 (Feneu, 49460) François Menard1 1843-12-12 (Feneu, 49460) Renée Menard 1837-09-29 (Feneu, 49460) Julie Menard 1835-11-04 (Feneu, 49460) A. Charpentier & E. Gallic Données généalogiques - 14/22
  • 24. Introduction Data preparation Preliminary results Conclusion Family tree And then retrieve these individuals’ mother(s) François Menard 1801-01-24 (Feneu, 49460) Julie Poulard 1802-05-31 (Feneu, 49460) François Menard1 1843-12-12 (Feneu, 49460) Renée Menard 1837-09-29 (Feneu, 49460) Julie Menard 1835-11-04 (Feneu, 49460) A. Charpentier & E. Gallic Données généalogiques - 14/22
  • 25. Introduction Data preparation Preliminary results Conclusion Family tree We then look up for the grandchildren François Menard 1801-01-24 (Feneu, 49460) Julie Poulard 1802-05-31 (Feneu, 49460) François Menard1 1843-12-12 (Feneu, 49460) Renée Menard 1837-09-29 (Feneu, 49460) Marie Hery 1874-07-27 (Feneu, 49460) Louise Hery 1873-04-28 (Feneu, 49460) Julie Menard 1835-11-04 (Feneu, 49460) François Jolie 1874-06-19 (Feneu, 49460) Jean Jolie 18700915 (Feneu, 49460) Joseph Jolie 1868-09-18 (Feneu, 49460) Pierre Jolie 1866-04-07 (Feneu, 49460) Julie Jolie 1864-07-05 (Feneu, 49460) A. Charpentier & E. Gallic Données généalogiques - 14/22
  • 26. Introduction Data preparation Preliminary results Conclusion Family tree And complete with the second parent François Menard 1801-01-24 (Feneu, 49460) Julie Poulard 1802-05-31 (Feneu, 49460) François Menard1 1843-12-12 (Feneu, 49460) ? ? Renée Menard 1837-09-29 (Feneu, 49460) Marie Hery 1874-07-27 (Feneu, 49460) Louise Hery 1873-04-28 (Feneu, 49460) Pierre Jolie 1837-06-19 (Soulaire- et-Bourg, 49460) Julie Menard 1835-11-04 (Feneu, 49460) François Jolie 1874-06-19 (Feneu, 49460) Jean Jolie 18700915 (Feneu, 49460) Joseph Jolie 1868-09-18 (Feneu, 49460) Pierre Jolie 1866-04-07 (Feneu, 49460) Julie Jolie 1864-07-05 (Feneu, 49460) A. Charpentier & E. Gallic Données généalogiques - 14/22
  • 27. Introduction Data preparation Preliminary results Conclusion Migration With these family trees, we are able to study migration of people from generation to generation Using the information given on the place of the events of their life A. Charpentier & E. Gallic Données généalogiques - 15/22
  • 28. Introduction Data preparation Preliminary results Conclusion Individuals from Maine-et-Loire Table 1: Number of observations in each generation Generation No Obs. Available Coord. Birth (%) Available Coord. Marriage (%) Available Coord. Death (%) 0 25,421 99.74 27.78 29.17 1 17,237 97.40 48.20 39.06 2 15,071 98.08 49.19 37.11 A. Charpentier & E. Gallic Données généalogiques - 16/22
  • 29. Introduction Data preparation Preliminary results Conclusion Migration Are children born next to the birthplace of their parents? We compute the distance between the places of birth of a child and his or her parents, and keep the closest (if available) Table 2: Distances between children and their parents (in km) Mother Father Generation Mean Dist. Min Dist. Max Dist. Mean Dist. Min Dist. Max Dist. 1 5.86 0 8,091.12 5.16 0 767.85 2 13.67 0 3,334.21 19.83 0 11,850.38 A. Charpentier & E. Gallic Données généalogiques - 17/22
  • 30. Introduction Data preparation Preliminary results Conclusion Migration Are children born next to the birthplace of their parents? We compute the distance between the places of birth of a child and his or her parents, and keep the closest (if available) Table 2: Distances between children and their parents (in km) Mother Father Generation Mean Dist. Min Dist. Max Dist. Mean Dist. Min Dist. Max Dist. 1 5.86 0 8,091.12 5.16 0 767.85 2 13.67 0 3,334.21 19.83 0 11,850.38 A. Charpentier & E. Gallic Données généalogiques - 17/22
  • 31. Introduction Data preparation Preliminary results Conclusion Migration We then construct three dummy variables: <10km : TRUE if the closest distance is lower than 10km <50km : TRUE if the closest distance is lower than 50km <100km : TRUE if the closest distance is lower than 100km Table 3: Distances between children and their parents Gen. Nb not missing dis- tance <10km <10km (%) <50km <50km (%) <100km <100km (%) Nb NAs Total 1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237 2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071 A. Charpentier & E. Gallic Données généalogiques - 18/22
  • 32. Introduction Data preparation Preliminary results Conclusion Migration We then construct three dummy variables: <10km : TRUE if the closest distance is lower than 10km <50km : TRUE if the closest distance is lower than 50km <100km : TRUE if the closest distance is lower than 100km Table 3: Distances between children and their parents Gen. Nb not missing dis- tance <10km <10km (%) <50km <50km (%) <100km <100km (%) Nb NAs Total 1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237 2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071 A. Charpentier & E. Gallic Données généalogiques - 18/22
  • 33. Introduction Data preparation Preliminary results Conclusion Migration We then construct three dummy variables: <10km : TRUE if the closest distance is lower than 10km <50km : TRUE if the closest distance is lower than 50km <100km : TRUE if the closest distance is lower than 100km Table 3: Distances between children and their parents Gen. Nb not missing dis- tance <10km <10km (%) <50km <50km (%) <100km <100km (%) Nb NAs Total 1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237 2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071 A. Charpentier & E. Gallic Données généalogiques - 18/22
  • 34. Introduction Data preparation Preliminary results Conclusion Migration We then construct three dummy variables: <10km : TRUE if the closest distance is lower than 10km <50km : TRUE if the closest distance is lower than 50km <100km : TRUE if the closest distance is lower than 100km Table 3: Distances between children and their parents Gen. Nb not missing dis- tance <10km <10km (%) <50km <50km (%) <100km <100km (%) Nb NAs Total 1 16,751 15,301 91.34 16,514 98.59 16,612 99.17 486 17,237 2 14,565 11,927 81.89 14,014 96.22 14,236 97.74 506 15,071 A. Charpentier & E. Gallic Données généalogiques - 18/22
  • 35.
  • 36. Introduction Data preparation Preliminary results Conclusion Migration: from birth to death We can also look at the spatial movements between birth and death for an individual Table 4: Distances between places of birth and death Gen Nb not missing dis- tance <10km <10km (%) <50km <50km (%) <100km <100km (%) Nb NAs Total 0 7,414 7,384 99.60 7,404 99.87 7,410 99.95 18,007 25,421 1 6,547 5,396 82.42 6,213 94.90 6,310 96.38 10,690 17,237 2 5,480 4,063 74.14 4,961 90.53 5,124 93.50 9,591 15,071 A. Charpentier & E. Gallic Données généalogiques - 20/22
  • 37.
  • 38. Introduction Data preparation Preliminary results Conclusion What comes next? Identify a path in space during an individual’s lifespan birth marriage(s) births of children death Enlarge the dataset (only people born between 1800 and 1875 here) Consider more regions A. Charpentier & E. Gallic Données généalogiques - 22/22
  • 39. References Bibliography I Bean, L. L., May, D. L., and Skolnick, M. (1978). The mormon historical demography project. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 11(1):45–53. doi:10.1080/01615440.1978.9955216. Bouchard, G., Roy, R., Casgrain, B., and Hubert, M. (1989). Fichier de population et structures de gestion de base de données : le fichier-réseau BALSAC et le système INGRES/INGRID. Histoire & Mesure, 4(1):39–57. doi:10.3406/hism.1989.874. Cummins, N. (2017). Lifespans of the european elite, 800–1800. The Journal of Economic History, 77(02):406–439. doi:10.1017/s0022050717000468. Dupâquier, J. (1981). Une grande enquête sur la mobilité géographique et sociale aux xixe et xxe siècles. Population, 36(6):1164–1167. doi:10.2307/1532329. Fire, M. and Elovici, Y. (2015). Data mining of online genealogy datasets for revealing lifespan patterns in human population. ACM Transactions on Intelligent Systems and Technology, 6(2):1–22. doi:10.1145/2700464. Gavrilova, N. S. and Gavrilov, L. A. (2007). Search for predictors of ex- ceptional human longevity. North American Actuarial Journal, 11(1):49–67. doi:10.1080/10920277.2007.10597437. Henry, L. (1956). Anciennes familles genevoises. etude démographique: XVIme - XXme siècle. Population, 11(2):334. doi:10.2307/1524668. A. Charpentier & E. Gallic Données généalogiques - 23/22
  • 40. References Bibliography II Kaplanis, J., Gordon, A., Wahl, M., Gershovits, M., Markus, B., Sheikh, M., Gymrek, M., Bhatia, G., MacArthur, D. G., Price, A., and Erlich, Y. (2017). Quantitative anal- ysis of population-scale family trees using millions of relatives. doi:10.1101/106427. Mandemakers, K. (2000). Historical sample of the netherlands. Handbook of interna- tional historical microdata for population research, pages 149–177. Matthijs, K. and Moreels, S. (2010). The antwerp cor*-database: A unique flemish source for historical-demographic research. The History of the Family, 15(1):109– 115. doi:10.1016/j.hisfam.2010.01.002. A. Charpentier & E. Gallic Données généalogiques - 24/22