• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Capturing emerging relations between schema ontologies on the Web of Data
 

Capturing emerging relations between schema ontologies on the Web of Data

on

  • 411 views

 

Statistics

Views

Total Views
411
Views on SlideShare
411
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Capturing emerging relations between schema ontologies on the Web of Data Capturing emerging relations between schema ontologies on the Web of Data Presentation Transcript

    • Capturing emerging relations between schema ontologies on the Web of Data
      Andriy Nikolov
      Enrico Motta
    • Public linked data
      “Linking” in the Linked Data cloud:
      References to instance URIs described in external sources
      Special case: identity links between equivalent resources
      Linking Open Data cloud diagram, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/
    • Motivation
      Schema heterogeneity is an obstacle both for creating and for utilising these links
      Extracting information on the same topic from different repositories
      Discovering equivalence links between individuals
      Motivation for our work: discovering instance-level links
      How to choose the repositories to connect a new one?
      Which subsets of repositories contain co-referring instances?
      ?
      LinkedMDB
      TV programs
      DBPedia
      Freebase
      movies
      pieces of music
      MusicBrainz
    • Schema-level interlinks
      Schema-level
      ?
      Data-level
    • Matching approaches
      “Top-down”
      Analyzing schema ontologies and generating alignments (manually or automatically)
      UMBEL
      Using CYC as a “backbone”
      Mapping commonly used schema ontologies
      “Bottom-up”
      Inferring schema mappings based on instance-level information
    • Our approach
      Constructing a large-scale network of schema mappings
      Applying a light-weight instance-based matcher
      Analysing the resulting network
      What does it tell us about the use of ontologies?
    • Motivating factors
      Potential use case scenarios
      Discovering relevant sources for connection
      Discovering relevant subsets of comparable instances
      Tolerance to the quality of mappings
      A mapping between “strongly overlapping” classes is still useful even if there is no strict equivalence/subsumption
    • Instance-based matching
      Use of instance-based matching
      Some implicit schema-level assumptions cannot be captured using only schema-level evidence
      Interpretation mismatches
      dbpedia:Actor = professional actor (film or stage)
      movie:actor = anybody who participated in a movie
      Class interpretation “as used” vs “as designed”
      FOAF: foaf:Person = any person
      DBLP: foaf:Person = computer scientist
    • Instance set overlaps
      Co-typing
      dbpedia:Artist
      yago:ItalianComposers
      DBPedia
      is_a
      is_a
      dbpedia:Ennio_Morricone
      Declared association
      movie:music_contributor
      dbpedia:Artist
      mo:MusicArtist
      is_a
      is_a
      LinkedMDB
      DBPedia
      MusicBrainz
      =
      =
      dbpedia:Ennio_Morricone
      movie:music_contributor/2490
      music:artist/a16…9fdf
    • Dataset
      Billion Triple Challenge 2009
      about 1.14 billion triples
      contains
      core LOD repositories (DBPedia, Freebase, Geonames, Musicbrainz, LinkedMDB,…)
      smaller semantic datasets retrieved by search servers (Falcon-S, Sindice)
      ≈3.6M co-typing-based overlapping pairs of classes
      ≈1M association-based pairs
    • Inferring mappings
      Classification task
      Classes A, B: is there a mapping?
      Boolean classification
      type of mappings assigned based on comparing sizes of instance sets
      Features
      𝑛𝑠1, 𝑛𝑠2: namespaces of class URIs
      |𝑒𝐴∩𝐵|: size of the overlap
      𝑒𝐴, 𝑒𝐵: sizes of instance sets
      |𝑒(𝐴∩𝐵)||𝑒𝐴|,|𝑒(𝐴∩𝐵)||𝑒𝐴|: ratio of the overlapping subset to the complete instance set
      • direct/indirect: whether classes have instances explicitly declared to be equivalent
       
    • Test
      Training
      Training set: 6000 overlapping pairs of classes
      Test: 10-fold cross-validation
      Training
      Training set: 6000 overlapping pairs of classes
      Test: 10-fold cross-validation
      Applying
      2 networks of class mappings
    • Observations: class mappings
      Association-based network: classes involved into the largest number of mappings
      High-level classes represented concepts covered in many repositories
      … and describing categories with very fine-grained class decomposition
      Usually also the most populated ones
      geonames:Feature
      freebase:people.person
      yago:PhysicalEntity
      linkedmdb:film
      umbel:Person
      akt:Person
      akt:ArticleReference

      “under-linked” ones?
    • Observations: class mappings
      Co-typing-based network: classes involved into the largest number of mappings
      Popular classes reused in many repositories
      … or in DBPedia
      … and describing categories with fine-grained class decomposition
      Usually also the most populated ones
      foaf:Person
      umbel:Person
      dbpedia:Person
      dbpedia:FootballPlayer
      wordnet:Person
      dbpedia:Album
      sioc:WikiArticle
      geonames:Feature

    • Links between ontologies
      Aggregated network: connections between ontologies
      Mapping-based links between ontologies
      At least 1 mapping between corresponding classes must exist
    • Association-based network
    • Association-based network
      Generic:
      • YAGO
      • Freebase
      • UMBEL
      • OpenCYC
      • DBPedia
    • Association-based network
      Domain-specific
      Generic:
      • YAGO
      • Freebase
      • UMBEL
      • OpenCYC
      • DBPedia
    • Association-based network
      Main factor: topic coverage
      Popularity for linking is not reflected
      Data-level: DBPedia has more connections than Freebase
      Schema-level: no substantial difference
      Effect of exploiting composed links
    • Co-typing-based network
      Main factor:
      Popularity for reuse
      FOAF and WordNet:
      the most popular
      DBPedia, YAGO, OpenCYC, UMBEL
      Reused for DBPedia instances
    • Outcomes
      Possible usage scenarios for mappings
      Selecting suitable sources to connect
      “LinkedMDB contains more movies than DBPedia – more likely to cover all my instances”
      Selecting an ontology to reuse to structure new instances
      Which sources use this ontology? Do I want my data to be integrated with them?
      Other data-driven tasks
      E.g., exploratory search
      Generic challenges
      How to take into account task requirements in ontology matching?
      Recall vs precision, fuzzy vs exact
      How to capture changes in the data?
      BTC 2009 is almost obsolete by now
    • Limitations and future work
      Limitations
      Light-weight matcher can lead to lower quality mappings
      OK for our scenario but not others
      Pre-existing instance-level mappings are not always available
      Future work
      Combining with schema-based ontology matching techniques
      Taking into account properties and complex correspondences
    • Questions?
      Thanks for your attention
    • Disjoint but overlapping
      Spurious owl:sameAs link
      dbpedia:Hippocrates(Hippocrates) = bookmashup:9004095748 (Hippocratic Lives and Legends (Studies in Ancient Medicine, Vol 4))
      Spurious rdf:typeassignment
      dbpedia:Celtic_Frost (band) defined as Person in DBPedia (fixed in the current version of DBPedia)
      Modelling assumptions
      dbpedia:Masada describes both the geographical place and the battle