• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
338
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Capturing emerging relations between schema ontologies on the Web of Data
    Andriy Nikolov
    Enrico Motta
  • 2. Public linked data
    “Linking” in the Linked Data cloud:
    References to instance URIs described in external sources
    Special case: identity links between equivalent resources
    Linking Open Data cloud diagram, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/
  • 3. Motivation
    Schema heterogeneity is an obstacle both for creating and for utilising these links
    Extracting information on the same topic from different repositories
    Discovering equivalence links between individuals
    Motivation for our work: discovering instance-level links
    How to choose the repositories to connect a new one?
    Which subsets of repositories contain co-referring instances?
    ?
    LinkedMDB
    TV programs
    DBPedia
    Freebase
    movies
    pieces of music
    MusicBrainz
  • 4. Schema-level interlinks
    Schema-level
    ?
    Data-level
  • 5. Matching approaches
    “Top-down”
    Analyzing schema ontologies and generating alignments (manually or automatically)
    UMBEL
    Using CYC as a “backbone”
    Mapping commonly used schema ontologies
    “Bottom-up”
    Inferring schema mappings based on instance-level information
  • 6. Our approach
    Constructing a large-scale network of schema mappings
    Applying a light-weight instance-based matcher
    Analysing the resulting network
    What does it tell us about the use of ontologies?
  • 7. Motivating factors
    Potential use case scenarios
    Discovering relevant sources for connection
    Discovering relevant subsets of comparable instances
    Tolerance to the quality of mappings
    A mapping between “strongly overlapping” classes is still useful even if there is no strict equivalence/subsumption
  • 8. Instance-based matching
    Use of instance-based matching
    Some implicit schema-level assumptions cannot be captured using only schema-level evidence
    Interpretation mismatches
    dbpedia:Actor = professional actor (film or stage)
    movie:actor = anybody who participated in a movie
    Class interpretation “as used” vs “as designed”
    FOAF: foaf:Person = any person
    DBLP: foaf:Person = computer scientist
  • 9. Instance set overlaps
    Co-typing
    dbpedia:Artist
    yago:ItalianComposers
    DBPedia
    is_a
    is_a
    dbpedia:Ennio_Morricone
    Declared association
    movie:music_contributor
    dbpedia:Artist
    mo:MusicArtist
    is_a
    is_a
    LinkedMDB
    DBPedia
    MusicBrainz
    =
    =
    dbpedia:Ennio_Morricone
    movie:music_contributor/2490
    music:artist/a16…9fdf
  • 10. Dataset
    Billion Triple Challenge 2009
    about 1.14 billion triples
    contains
    core LOD repositories (DBPedia, Freebase, Geonames, Musicbrainz, LinkedMDB,…)
    smaller semantic datasets retrieved by search servers (Falcon-S, Sindice)
    ≈3.6M co-typing-based overlapping pairs of classes
    ≈1M association-based pairs
  • 11. Inferring mappings
    Classification task
    Classes A, B: is there a mapping?
    Boolean classification
    type of mappings assigned based on comparing sizes of instance sets
    Features
    𝑛𝑠1, 𝑛𝑠2: namespaces of class URIs
    |𝑒𝐴∩𝐵|: size of the overlap
    𝑒𝐴, 𝑒𝐵: sizes of instance sets
    |𝑒(𝐴∩𝐵)||𝑒𝐴|,|𝑒(𝐴∩𝐵)||𝑒𝐴|: ratio of the overlapping subset to the complete instance set
    • direct/indirect: whether classes have instances explicitly declared to be equivalent
     
  • 12. Test
    Training
    Training set: 6000 overlapping pairs of classes
    Test: 10-fold cross-validation
    Training
    Training set: 6000 overlapping pairs of classes
    Test: 10-fold cross-validation
    Applying
    2 networks of class mappings
  • 13. Observations: class mappings
    Association-based network: classes involved into the largest number of mappings
    High-level classes represented concepts covered in many repositories
    … and describing categories with very fine-grained class decomposition
    Usually also the most populated ones
    geonames:Feature
    freebase:people.person
    yago:PhysicalEntity
    linkedmdb:film
    umbel:Person
    akt:Person
    akt:ArticleReference

    “under-linked” ones?
  • 14. Observations: class mappings
    Co-typing-based network: classes involved into the largest number of mappings
    Popular classes reused in many repositories
    … or in DBPedia
    … and describing categories with fine-grained class decomposition
    Usually also the most populated ones
    foaf:Person
    umbel:Person
    dbpedia:Person
    dbpedia:FootballPlayer
    wordnet:Person
    dbpedia:Album
    sioc:WikiArticle
    geonames:Feature

  • 15. Links between ontologies
    Aggregated network: connections between ontologies
    Mapping-based links between ontologies
    At least 1 mapping between corresponding classes must exist
  • 16. Association-based network
  • 17. Association-based network
    Generic:
  • Association-based network
    Domain-specific
    Generic:
  • Association-based network
    Main factor: topic coverage
    Popularity for linking is not reflected
    Data-level: DBPedia has more connections than Freebase
    Schema-level: no substantial difference
    Effect of exploiting composed links
  • 26. Co-typing-based network
    Main factor:
    Popularity for reuse
    FOAF and WordNet:
    the most popular
    DBPedia, YAGO, OpenCYC, UMBEL
    Reused for DBPedia instances
  • 27. Outcomes
    Possible usage scenarios for mappings
    Selecting suitable sources to connect
    “LinkedMDB contains more movies than DBPedia – more likely to cover all my instances”
    Selecting an ontology to reuse to structure new instances
    Which sources use this ontology? Do I want my data to be integrated with them?
    Other data-driven tasks
    E.g., exploratory search
    Generic challenges
    How to take into account task requirements in ontology matching?
    Recall vs precision, fuzzy vs exact
    How to capture changes in the data?
    BTC 2009 is almost obsolete by now
  • 28. Limitations and future work
    Limitations
    Light-weight matcher can lead to lower quality mappings
    OK for our scenario but not others
    Pre-existing instance-level mappings are not always available
    Future work
    Combining with schema-based ontology matching techniques
    Taking into account properties and complex correspondences
  • 29. Questions?
    Thanks for your attention
  • 30. Disjoint but overlapping
    Spurious owl:sameAs link
    dbpedia:Hippocrates(Hippocrates) = bookmashup:9004095748 (Hippocratic Lives and Legends (Studies in Ancient Medicine, Vol 4))
    Spurious rdf:typeassignment
    dbpedia:Celtic_Frost (band) defined as Person in DBPedia (fixed in the current version of DBPedia)
    Modelling assumptions
    dbpedia:Masada describes both the geographical place and the battle