SlideShare a Scribd company logo
1 of 22
Modeling genealogical domain:
      an open problem

        Joan Campanyà Artés
         Jordi Conesa Caralt
         Enric Mayol Sarroca


                          KEOD 2012 - Barcelona


                                                  1
Could be that
  you are a
descendant of
Charlemagne?




                2

    Statistically, if your ancestors are predominantly
    europeans, it's virtually impossible not to be

     But if you are not
     satisfied with the
     eventuality and wish
     to demonstrate
     kinship, we must
     consult reliable
     sources and historical
     records supporting our
     assumption

    Genealogy is the study of families and the tracing of
    their lineages and history
                                                         3
But we would like a automated
      genealogy research...
              primary
              sources               online
                                    resources


              data processing and
              knowledge inference       data from user's applications
family tree




                                                                  4
… in any case reliyng on
           recognised sources
                                     supported by


                primary
                sources               online
                                      resources


                data processing and
                knowledge inference       data from user's applications
family tree


                          supported by




                                                                    5
A common conceptual model of the domain
        will make things easier

    Modeling genealogical domain:
          an open problem

             Joan Campanyà Artés
              Jordi Conesa Caralt
              Enric Mayol Sarroca


                               KEOD 2012 - Barcelona


                                                       6
Index

    Genealogy: a very complex domain

    State of the art. Standards and Specifications to
    share genealogical data.

    Genealogical knowledge processing. "Open
    World Assumption" (OWA) versus "Closed World
    Assumption" (CWA)

    Our proposal. Sources and statements

    Modeling entities and relationships

    Challenges for future work

    Conclusions
                                                        7
Modeling genealogy is a problem?
Intrinsic complexity of the domain

    Syntactic variants: names of individuals and
    locations often appears with lexical variants that
    difficult the proper recognition. (Examples: Joan Campanyà /
    Juan Campañá, Vic / Vich, Viella / Vielha)

    Structural heterogeneity: the familiar pattern and
    roles of individuals depend on temporal and cultural
    context in which they occur. (Examples: paternal or maternal
    family name according to cultural contexts, blood relatives, ...)

    Data entry errors: they may be transcription errors or
    erroneous interpretation. (Examples: erroneous birth or death
    dates, inaccurate records due to forced translations for political reasons or
    ignorance, ...)
                                                                           8
Agree on a model, an opportunity!
    Distributed and independent data
                structures
                     primary
                     sources            online
                                        resources




 Primary sources adopt                      data from user's applications
hetereogeneus data structures

 Online and semantic web services
provide access to specific data
repositories

  Private applications lack of common
and recognized standards
(entities/relationships)
                                                                      9
GEDCOM

    Difficult evolution: it's a proprietary format

    Family-centered. This does not facilitate the search for
    ancestors that is much of the work of genealogists

    Ambiguity: the specification does not set limits on their
    hierarchical structure. So, we can find incompatibilities
    between different implementations of the standard

    Lack of source references: there are no tracking for
    data connected to the research process, making difficult
    subsequent verification or reuse of sources

    Inconsistencies may occur due to data duplication
                                                           10
GENTECH
Interesting performances:

    All genealogical data are broken down into a series of
    short, formal genealogical statements

    Introduces key concepts: Events (anything happened
    in someone’s life) and relationships (between two
    people)
Drawbacks:

    Restrictive predefined categories of DataTypes,
    TypeValues and Collections

    The model assumes its implementation on relational
    databases
                                                      11
Modeling with ontologies

    Zandhuis, 2005. Genealogical data modeled with OWL/RDF.
    Enable the potential use of the Semantic Web. Did not
    develop much beyond that the class structure

    Campbell, 2006. Open network data, scalable, extensible,
    based on open standards and understandable by machines.
    Genealogical data fragmented in the form of subject-
    predicate-object sentences, in OWL-RDF files.

    Woodbury, 2010. Information system based on individuals
    and events. Textual data is analyzed using ontological
    patterns and regular expressions, complemented with SWRL
    rules for integrity constraints.

    … other interesting works must be considered

                                                        12
Limitations of existing standards
              and systems

    We haven't a recognized and unified genealogical model as
    standard. In this void, GEDCOM file format is extensively used for
    exchange genealogical data

    Most genealogical information systems presupposes a closed
    world (CWA), in the sense that everything that is not reflected in
    the form of tuples (ie., not declared in the extension) is false or
    nonexistent.

Then, where to start?
We are interested in the semantic value of attributes and
roles, not in the explicit record syntax or types. We need
transform from implicit to explicit semantic knowledge, in
a way to reaching a open world assumption (OWA)
                                                                  13
Our proposal
                                       supported by


                  primary
                  sources               online
                                        resources


                  data processing and
                  knowledge inference       data from user's applications


                            supported by
Any statement of genealogical
facts must be supported by
recognized sources

                                                                      14
Overall view





    Formalize knowledge through ontologies

    Agree on a reference domain model, flexible enough
    to adapt different contexts

    Proceed on a ontological mapping between this
    model and existent genealogy services and
    applications
                                                     15
Sources and Statements

    Assertions are
    annotations of
    genealogical interest, and
    refer to one or more
    Statements. There are
    supported by
    documentary primary
    Sources

    Statement class records
    concepts and their
    relationships as atomic
    triples, in the form of
    <subject, predicate,
    object>
Example: <Person "Person_10”>, <GenealogicalPredicate ”father”>, <Person "Person_30”>
                                                                           16
Modeling Entity and populating
       Facts ontology




                             17
Modeling Event, Place and Date




                             18
PersonaEvents ontology
                      Authomatic population
Facts ontology
                                          PersonaEvents ontology




 Data extraction and knowledge inference will be executed
 over PersonaEvents ontology.
 Facts ontology will allow us to retrieval primary sources
                                                           19
Challenges for future work

    Instances identification and register (entity)
    matching

    Automatic population of PersonaEvents
    ontology from basic statements in Facts
    ontologies, keeping references to Sources

    Make decidable the knowledge inference from
    PersonaEvents ontology (OWL-DL and SWRL
    rules)

    Refine the model, in particular Properties and
    Attributes, to accommodate the widest possible
    range of contexts                           20
Conclusions

    Sharing data between genealogical resources
    would benefit from the existence of a reference
    model

    GEDCOM data exchange format are widely
    accepted, but recognition of family ties
    between resources requires some expert
    assistance

    With ontologies we can model genealogical
    domain entities, properties and constraints

    Extract implicit knowledge from source
    statements is possible by logics and       21
Are you eager to confirm
that you are a descendant
     of Charlemagne?




                            22

More Related Content

Similar to Genealogical domain

Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebRutger Vos
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
 
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...Riccardo Albertoni
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalElena Simperl
 
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp Europeana
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Paris Sud University
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospectsGuus Schreiber
 
Semantic Integration for Heterogeneous Domain-specific Information: The NIF Case
Semantic Integration for Heterogeneous Domain-specific Information: The NIF CaseSemantic Integration for Heterogeneous Domain-specific Information: The NIF Case
Semantic Integration for Heterogeneous Domain-specific Information: The NIF CaseNeuroscience Information Framework
 
Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Michele Pasin
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip finalDeborah McGuinness
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowEric Stephan
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic WebYannis Kalfoglou
 
Use of ontologies in natural language processing
Use of ontologies in natural language processingUse of ontologies in natural language processing
Use of ontologies in natural language processingATHMAN HAJ-HAMOU
 

Similar to Genealogical domain (20)

Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic Web
 
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCFueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
 
Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an...
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
 
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp
Europeana Network Association AGM 2016 - 9 November - Speaker Shawn Averkamp
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 
The Semantic Web: status and prospects
The Semantic Web: status and prospectsThe Semantic Web: status and prospects
The Semantic Web: status and prospects
 
Semantic Integration for Heterogeneous Domain-specific Information: The NIF Case
Semantic Integration for Heterogeneous Domain-specific Information: The NIF CaseSemantic Integration for Heterogeneous Domain-specific Information: The NIF Case
Semantic Integration for Heterogeneous Domain-specific Information: The NIF Case
 
Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)Introducing CIDOC-CRM (Cch KR workshop #2.1)
Introducing CIDOC-CRM (Cch KR workshop #2.1)
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic Web
 
Use of ontologies in natural language processing
Use of ontologies in natural language processingUse of ontologies in natural language processing
Use of ontologies in natural language processing
 

Recently uploaded

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Recently uploaded (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 

Genealogical domain

  • 1. Modeling genealogical domain: an open problem Joan Campanyà Artés Jordi Conesa Caralt Enric Mayol Sarroca KEOD 2012 - Barcelona 1
  • 2. Could be that you are a descendant of Charlemagne? 2
  • 3. Statistically, if your ancestors are predominantly europeans, it's virtually impossible not to be  But if you are not satisfied with the eventuality and wish to demonstrate kinship, we must consult reliable sources and historical records supporting our assumption Genealogy is the study of families and the tracing of their lineages and history 3
  • 4. But we would like a automated genealogy research... primary sources online resources data processing and knowledge inference data from user's applications family tree 4
  • 5. … in any case reliyng on recognised sources supported by primary sources online resources data processing and knowledge inference data from user's applications family tree supported by 5
  • 6. A common conceptual model of the domain will make things easier Modeling genealogical domain: an open problem Joan Campanyà Artés Jordi Conesa Caralt Enric Mayol Sarroca KEOD 2012 - Barcelona 6
  • 7. Index  Genealogy: a very complex domain  State of the art. Standards and Specifications to share genealogical data.  Genealogical knowledge processing. "Open World Assumption" (OWA) versus "Closed World Assumption" (CWA)  Our proposal. Sources and statements  Modeling entities and relationships  Challenges for future work  Conclusions 7
  • 8. Modeling genealogy is a problem? Intrinsic complexity of the domain  Syntactic variants: names of individuals and locations often appears with lexical variants that difficult the proper recognition. (Examples: Joan Campanyà / Juan Campañá, Vic / Vich, Viella / Vielha)  Structural heterogeneity: the familiar pattern and roles of individuals depend on temporal and cultural context in which they occur. (Examples: paternal or maternal family name according to cultural contexts, blood relatives, ...)  Data entry errors: they may be transcription errors or erroneous interpretation. (Examples: erroneous birth or death dates, inaccurate records due to forced translations for political reasons or ignorance, ...) 8
  • 9. Agree on a model, an opportunity! Distributed and independent data structures primary sources online resources  Primary sources adopt data from user's applications hetereogeneus data structures  Online and semantic web services provide access to specific data repositories  Private applications lack of common and recognized standards (entities/relationships) 9
  • 10. GEDCOM  Difficult evolution: it's a proprietary format  Family-centered. This does not facilitate the search for ancestors that is much of the work of genealogists  Ambiguity: the specification does not set limits on their hierarchical structure. So, we can find incompatibilities between different implementations of the standard  Lack of source references: there are no tracking for data connected to the research process, making difficult subsequent verification or reuse of sources  Inconsistencies may occur due to data duplication 10
  • 11. GENTECH Interesting performances:  All genealogical data are broken down into a series of short, formal genealogical statements  Introduces key concepts: Events (anything happened in someone’s life) and relationships (between two people) Drawbacks:  Restrictive predefined categories of DataTypes, TypeValues and Collections  The model assumes its implementation on relational databases 11
  • 12. Modeling with ontologies  Zandhuis, 2005. Genealogical data modeled with OWL/RDF. Enable the potential use of the Semantic Web. Did not develop much beyond that the class structure  Campbell, 2006. Open network data, scalable, extensible, based on open standards and understandable by machines. Genealogical data fragmented in the form of subject- predicate-object sentences, in OWL-RDF files.  Woodbury, 2010. Information system based on individuals and events. Textual data is analyzed using ontological patterns and regular expressions, complemented with SWRL rules for integrity constraints.  … other interesting works must be considered 12
  • 13. Limitations of existing standards and systems  We haven't a recognized and unified genealogical model as standard. In this void, GEDCOM file format is extensively used for exchange genealogical data  Most genealogical information systems presupposes a closed world (CWA), in the sense that everything that is not reflected in the form of tuples (ie., not declared in the extension) is false or nonexistent. Then, where to start? We are interested in the semantic value of attributes and roles, not in the explicit record syntax or types. We need transform from implicit to explicit semantic knowledge, in a way to reaching a open world assumption (OWA) 13
  • 14. Our proposal supported by primary sources online resources data processing and knowledge inference data from user's applications supported by Any statement of genealogical facts must be supported by recognized sources 14
  • 15. Overall view  Formalize knowledge through ontologies  Agree on a reference domain model, flexible enough to adapt different contexts  Proceed on a ontological mapping between this model and existent genealogy services and applications 15
  • 16. Sources and Statements  Assertions are annotations of genealogical interest, and refer to one or more Statements. There are supported by documentary primary Sources  Statement class records concepts and their relationships as atomic triples, in the form of <subject, predicate, object> Example: <Person "Person_10”>, <GenealogicalPredicate ”father”>, <Person "Person_30”> 16
  • 17. Modeling Entity and populating Facts ontology 17
  • 18. Modeling Event, Place and Date 18
  • 19. PersonaEvents ontology Authomatic population Facts ontology PersonaEvents ontology Data extraction and knowledge inference will be executed over PersonaEvents ontology. Facts ontology will allow us to retrieval primary sources 19
  • 20. Challenges for future work  Instances identification and register (entity) matching  Automatic population of PersonaEvents ontology from basic statements in Facts ontologies, keeping references to Sources  Make decidable the knowledge inference from PersonaEvents ontology (OWL-DL and SWRL rules)  Refine the model, in particular Properties and Attributes, to accommodate the widest possible range of contexts 20
  • 21. Conclusions  Sharing data between genealogical resources would benefit from the existence of a reference model  GEDCOM data exchange format are widely accepted, but recognition of family ties between resources requires some expert assistance  With ontologies we can model genealogical domain entities, properties and constraints  Extract implicit knowledge from source statements is possible by logics and 21
  • 22. Are you eager to confirm that you are a descendant of Charlemagne? 22

Editor's Notes

  1. You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren&apos;t online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It&apos;s very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  2. You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren&apos;t online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It&apos;s very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  3. You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren&apos;t online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It&apos;s very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions
  4. You can visit genealogy resources on the internet to see what is available. You will definitely want to find out if others have already done research on your line. Check places like The Church of Jesus Christ of Latter-Day Saints (LDS) Family History Centers (their online site is Family Search). But it would not be surprising that the records you are looking for aren&apos;t online. In this case, other primary sources will be helpful in your search (land, probate, church, county records). It&apos;s very likely have to reconcile data from different sources. But it may not match names, dates, or the records contain errors or contradictions