Modeling genealogical domain:      an open problem        Joan Campanyà Artés         Jordi Conesa Caralt         Enric Ma...
Could be that  you are adescendant ofCharlemagne?                2
    Statistically, if your ancestors are predominantly    europeans, its virtually impossible not to be     But if you a...
But we would like a automated      genealogy research...              primary              sources               online   ...
… in any case reliyng on           recognised sources                                     supported by                prim...
A common conceptual model of the domain        will make things easier    Modeling genealogical domain:          an open p...
Index    Genealogy: a very complex domain    State of the art. Standards and Specifications to    share genealogical dat...
Modeling genealogy is a problem?Intrinsic complexity of the domain    Syntactic variants: names of individuals and    loc...
Agree on a model, an opportunity!    Distributed and independent data                structures                     primar...
GEDCOM    Difficult evolution: its a proprietary format    Family-centered. This does not facilitate the search for    a...
GENTECHInteresting performances:    All genealogical data are broken down into a series of    short, formal genealogical ...
Modeling with ontologies    Zandhuis, 2005. Genealogical data modeled with OWL/RDF.    Enable the potential use of the Se...
Limitations of existing standards              and systems    We havent a recognized and unified genealogical model as   ...
Our proposal                                       supported by                  primary                  sources         ...
Overall view    Formalize knowledge through ontologies    Agree on a reference domain model, flexible enough    to adapt...
Sources and Statements    Assertions are    annotations of    genealogical interest, and    refer to one or more    State...
Modeling Entity and populating       Facts ontology                             17
Modeling Event, Place and Date                             18
PersonaEvents ontology                      Authomatic populationFacts ontology                                          P...
Challenges for future work    Instances identification and register (entity)    matching    Automatic population of Pers...
Conclusions    Sharing data between genealogical resources    would benefit from the existence of a reference    model  ...
Are you eager to confirmthat you are a descendant     of Charlemagne?                            22
Upcoming SlideShare
Loading in...5
×

Genealogical domain

363

Published on

KEOD-2012 Conference (Barcelona)

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
363
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  • You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  • You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  • You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren't online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It's very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  • Genealogical domain

    1. 1. Modeling genealogical domain: an open problem Joan Campanyà Artés Jordi Conesa Caralt Enric Mayol Sarroca KEOD 2012 - Barcelona 1
    2. 2. Could be that you are adescendant ofCharlemagne? 2
    3. 3.  Statistically, if your ancestors are predominantly europeans, its virtually impossible not to be But if you are not satisfied with the eventuality and wish to demonstrate kinship, we must consult reliable sources and historical records supporting our assumption Genealogy is the study of families and the tracing of their lineages and history 3
    4. 4. But we would like a automated genealogy research... primary sources online resources data processing and knowledge inference data from users applicationsfamily tree 4
    5. 5. … in any case reliyng on recognised sources supported by primary sources online resources data processing and knowledge inference data from users applicationsfamily tree supported by 5
    6. 6. A common conceptual model of the domain will make things easier Modeling genealogical domain: an open problem Joan Campanyà Artés Jordi Conesa Caralt Enric Mayol Sarroca KEOD 2012 - Barcelona 6
    7. 7. Index Genealogy: a very complex domain State of the art. Standards and Specifications to share genealogical data. Genealogical knowledge processing. "Open World Assumption" (OWA) versus "Closed World Assumption" (CWA) Our proposal. Sources and statements Modeling entities and relationships Challenges for future work Conclusions 7
    8. 8. Modeling genealogy is a problem?Intrinsic complexity of the domain Syntactic variants: names of individuals and locations often appears with lexical variants that difficult the proper recognition. (Examples: Joan Campanyà / Juan Campañá, Vic / Vich, Viella / Vielha) Structural heterogeneity: the familiar pattern and roles of individuals depend on temporal and cultural context in which they occur. (Examples: paternal or maternal family name according to cultural contexts, blood relatives, ...) Data entry errors: they may be transcription errors or erroneous interpretation. (Examples: erroneous birth or death dates, inaccurate records due to forced translations for political reasons or ignorance, ...) 8
    9. 9. Agree on a model, an opportunity! Distributed and independent data structures primary sources online resources Primary sources adopt data from users applicationshetereogeneus data structures Online and semantic web servicesprovide access to specific datarepositories Private applications lack of commonand recognized standards(entities/relationships) 9
    10. 10. GEDCOM Difficult evolution: its a proprietary format Family-centered. This does not facilitate the search for ancestors that is much of the work of genealogists Ambiguity: the specification does not set limits on their hierarchical structure. So, we can find incompatibilities between different implementations of the standard Lack of source references: there are no tracking for data connected to the research process, making difficult subsequent verification or reuse of sources Inconsistencies may occur due to data duplication 10
    11. 11. GENTECHInteresting performances: All genealogical data are broken down into a series of short, formal genealogical statements Introduces key concepts: Events (anything happened in someone’s life) and relationships (between two people)Drawbacks: Restrictive predefined categories of DataTypes, TypeValues and Collections The model assumes its implementation on relational databases 11
    12. 12. Modeling with ontologies Zandhuis, 2005. Genealogical data modeled with OWL/RDF. Enable the potential use of the Semantic Web. Did not develop much beyond that the class structure Campbell, 2006. Open network data, scalable, extensible, based on open standards and understandable by machines. Genealogical data fragmented in the form of subject- predicate-object sentences, in OWL-RDF files. Woodbury, 2010. Information system based on individuals and events. Textual data is analyzed using ontological patterns and regular expressions, complemented with SWRL rules for integrity constraints. … other interesting works must be considered 12
    13. 13. Limitations of existing standards and systems We havent a recognized and unified genealogical model as standard. In this void, GEDCOM file format is extensively used for exchange genealogical data Most genealogical information systems presupposes a closed world (CWA), in the sense that everything that is not reflected in the form of tuples (ie., not declared in the extension) is false or nonexistent.Then, where to start?We are interested in the semantic value of attributes androles, not in the explicit record syntax or types. We needtransform from implicit to explicit semantic knowledge, ina way to reaching a open world assumption (OWA) 13
    14. 14. Our proposal supported by primary sources online resources data processing and knowledge inference data from users applications supported byAny statement of genealogicalfacts must be supported byrecognized sources 14
    15. 15. Overall view Formalize knowledge through ontologies Agree on a reference domain model, flexible enough to adapt different contexts Proceed on a ontological mapping between this model and existent genealogy services and applications 15
    16. 16. Sources and Statements Assertions are annotations of genealogical interest, and refer to one or more Statements. There are supported by documentary primary Sources Statement class records concepts and their relationships as atomic triples, in the form of <subject, predicate, object>Example: <Person "Person_10”>, <GenealogicalPredicate ”father”>, <Person "Person_30”> 16
    17. 17. Modeling Entity and populating Facts ontology 17
    18. 18. Modeling Event, Place and Date 18
    19. 19. PersonaEvents ontology Authomatic populationFacts ontology PersonaEvents ontology Data extraction and knowledge inference will be executed over PersonaEvents ontology. Facts ontology will allow us to retrieval primary sources 19
    20. 20. Challenges for future work Instances identification and register (entity) matching Automatic population of PersonaEvents ontology from basic statements in Facts ontologies, keeping references to Sources Make decidable the knowledge inference from PersonaEvents ontology (OWL-DL and SWRL rules) Refine the model, in particular Properties and Attributes, to accommodate the widest possible range of contexts 20
    21. 21. Conclusions Sharing data between genealogical resources would benefit from the existence of a reference model GEDCOM data exchange format are widely accepted, but recognition of family ties between resources requires some expert assistance With ontologies we can model genealogical domain entities, properties and constraints Extract implicit knowledge from source statements is possible by logics and 21
    22. 22. Are you eager to confirmthat you are a descendant of Charlemagne? 22

    ×