3.
Statistically, if your ancestors are predominantly
europeans, it's virtually impossible not to be
But if you are not
satisfied with the
eventuality and wish
to demonstrate
kinship, we must
consult reliable
sources and historical
records supporting our
assumption
Genealogy is the study of families and the tracing of
their lineages and history
3
4. But we would like a automated
genealogy research...
primary
sources online
resources
data processing and
knowledge inference data from user's applications
family tree
4
5. … in any case reliyng on
recognised sources
supported by
primary
sources online
resources
data processing and
knowledge inference data from user's applications
family tree
supported by
5
6. A common conceptual model of the domain
will make things easier
Modeling genealogical domain:
an open problem
Joan Campanyà Artés
Jordi Conesa Caralt
Enric Mayol Sarroca
KEOD 2012 - Barcelona
6
7. Index
Genealogy: a very complex domain
State of the art. Standards and Specifications to
share genealogical data.
Genealogical knowledge processing. "Open
World Assumption" (OWA) versus "Closed World
Assumption" (CWA)
Our proposal. Sources and statements
Modeling entities and relationships
Challenges for future work
Conclusions
7
8. Modeling genealogy is a problem?
Intrinsic complexity of the domain
Syntactic variants: names of individuals and
locations often appears with lexical variants that
difficult the proper recognition. (Examples: Joan Campanyà /
Juan Campañá, Vic / Vich, Viella / Vielha)
Structural heterogeneity: the familiar pattern and
roles of individuals depend on temporal and cultural
context in which they occur. (Examples: paternal or maternal
family name according to cultural contexts, blood relatives, ...)
Data entry errors: they may be transcription errors or
erroneous interpretation. (Examples: erroneous birth or death
dates, inaccurate records due to forced translations for political reasons or
ignorance, ...)
8
9. Agree on a model, an opportunity!
Distributed and independent data
structures
primary
sources online
resources
Primary sources adopt data from user's applications
hetereogeneus data structures
Online and semantic web services
provide access to specific data
repositories
Private applications lack of common
and recognized standards
(entities/relationships)
9
10. GEDCOM
Difficult evolution: it's a proprietary format
Family-centered. This does not facilitate the search for
ancestors that is much of the work of genealogists
Ambiguity: the specification does not set limits on their
hierarchical structure. So, we can find incompatibilities
between different implementations of the standard
Lack of source references: there are no tracking for
data connected to the research process, making difficult
subsequent verification or reuse of sources
Inconsistencies may occur due to data duplication
10
11. GENTECH
Interesting performances:
All genealogical data are broken down into a series of
short, formal genealogical statements
Introduces key concepts: Events (anything happened
in someone’s life) and relationships (between two
people)
Drawbacks:
Restrictive predefined categories of DataTypes,
TypeValues and Collections
The model assumes its implementation on relational
databases
11
12. Modeling with ontologies
Zandhuis, 2005. Genealogical data modeled with OWL/RDF.
Enable the potential use of the Semantic Web. Did not
develop much beyond that the class structure
Campbell, 2006. Open network data, scalable, extensible,
based on open standards and understandable by machines.
Genealogical data fragmented in the form of subject-
predicate-object sentences, in OWL-RDF files.
Woodbury, 2010. Information system based on individuals
and events. Textual data is analyzed using ontological
patterns and regular expressions, complemented with SWRL
rules for integrity constraints.
… other interesting works must be considered
12
13. Limitations of existing standards
and systems
We haven't a recognized and unified genealogical model as
standard. In this void, GEDCOM file format is extensively used for
exchange genealogical data
Most genealogical information systems presupposes a closed
world (CWA), in the sense that everything that is not reflected in
the form of tuples (ie., not declared in the extension) is false or
nonexistent.
Then, where to start?
We are interested in the semantic value of attributes and
roles, not in the explicit record syntax or types. We need
transform from implicit to explicit semantic knowledge, in
a way to reaching a open world assumption (OWA)
13
14. Our proposal
supported by
primary
sources online
resources
data processing and
knowledge inference data from user's applications
supported by
Any statement of genealogical
facts must be supported by
recognized sources
14
15. Overall view
Formalize knowledge through ontologies
Agree on a reference domain model, flexible enough
to adapt different contexts
Proceed on a ontological mapping between this
model and existent genealogy services and
applications
15
16. Sources and Statements
Assertions are
annotations of
genealogical interest, and
refer to one or more
Statements. There are
supported by
documentary primary
Sources
Statement class records
concepts and their
relationships as atomic
triples, in the form of
<subject, predicate,
object>
Example: <Person "Person_10”>, <GenealogicalPredicate ”father”>, <Person "Person_30”>
16
19. PersonaEvents ontology
Authomatic population
Facts ontology
PersonaEvents ontology
Data extraction and knowledge inference will be executed
over PersonaEvents ontology.
Facts ontology will allow us to retrieval primary sources
19
20. Challenges for future work
Instances identification and register (entity)
matching
Automatic population of PersonaEvents
ontology from basic statements in Facts
ontologies, keeping references to Sources
Make decidable the knowledge inference from
PersonaEvents ontology (OWL-DL and SWRL
rules)
Refine the model, in particular Properties and
Attributes, to accommodate the widest possible
range of contexts 20
21. Conclusions
Sharing data between genealogical resources
would benefit from the existence of a reference
model
GEDCOM data exchange format are widely
accepted, but recognition of family ties
between resources requires some expert
assistance
With ontologies we can model genealogical
domain entities, properties and constraints
Extract implicit knowledge from source
statements is possible by logics and 21
22. Are you eager to confirm
that you are a descendant
of Charlemagne?
22
Editor's Notes
You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions