Your SlideShare is downloading. ×
0
Extracting core knowledge     from Linked Data Valentina Presutti, Lora Aroyo, Alessandro Adamou, Balthasar Schopman, Aldo...
Our initial goal was to designrecommendation strategies basedon linked data, e.g. for TVprograms
Use cases   Dataset    Jamendo     JPeel        LMDB   nTriples     1,047,950      271,369   6,147,978   nProps           ...
Conclusion,                  ongoing and future workConclusion• Linked Data sets as connected knowledge components (KA)   ...
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Wai 2011
Upcoming SlideShare
Loading in...5
×

Wai 2011

419

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
419
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Using KPs and paths, we can provide prototypical ready-to-use queries for core and concealed knowledge to emerge, thus addressing requirement (2). Although the method applies to datasets whose logical structure is not known a priori, it is meant to analyse LD for serendipitous knowledge
  • Dataset knowledge is often not uniformly organized, thus it is generally unknown how to query for it. To deal with these issues, we propose a dataset analysis approach based on knowledge patterns, and show how the recognition of patterns can support querying datasets even if their vocabularies are previously unknown. Finally, we discuss results from experimenting on three multimedia-related datasets.
  • Using KPs and paths, we can provide prototypical ready-to-use queries for core and concealed knowledge to emerge, thus addressing requirement (2). Although the method applies to datasets whose logical structure is not known a priori, it is meant to analyse LD for serendipitous knowledge
  • A knowledge pattern (KP) for a type in an RDF graph includes: (i) the properties by which instances of this type relate to other individuals; (ii) the types of such individuals for each property. A KP is an invariance across observed data or objects that allows a formal or cognitive interpretation [4]. A KP embeds the key relations that describe a relevant piece of knowledge in a certain domain of interest, similar to linguistic frames and cognitive schemata.A path is an ordered type-property sequence that can be traversed in an RDF graph. Note the use of types in lieu of their instances, which instead denotemultiple occurrences of the same path. The length of a path is the number of properties involved (possibly even with repetitions)
  • Our approach uses a strategy aimed at modelling, inspecting, and summarizing Linked Data sets, thereby drawing what we call their knowledge architecture, which relies on the notions of paths and KPsIdentify the properties used in a dataset triples, and model them through the class Property; Identify the types (classes or literals) of the subject and object resources of such triples, and model them through the class Type; Identify the typical paths that connect triples in a dataset, and model them through the class Path.
  • This, in turn, relies on the notions of KPs and type-property paths. We identify the central properties and types, i.e. those able to capture most of the knowledge in a dataset, and extract KPs based on the central types. In other words, we extract the dataset vocabulary and analyse the way the data are used in terms of patterns. We also associate general-purpose measures, such as betweenness and centrality, to the knowledge architecture components of a dataset for performing empirical analysis.A procedure for building prototypical queries without prior knowledge of a dataset vocabulary
  • 1. Gather property usage statistics for a chosen dataset and store them as ABox of the knowledge architecture ontology;2. Query the dataset for extracting paths. Store all paths with length up to 4, with their usage statistics, in the knowledge architecture dataset3. Identify central types and central properties based on their frequencies in a key position in paths, i.e. betweenness, and number of instantiations;4. Extract emerging KPs based on the dataset's central types and properties;5. Select clustering factors among central properties, i.e. those properties occupying the same position in a set of paths, and construct path clusters;
  • Identify the properties used in a dataset triples, and model them through the class Property; Identify the types (classes or literals) of the subject and object resources of such triples, and model them through the class Type; Identify the typical paths that connect triples in a dataset, and model them through the class Path.
  • compute betweenness of types by counting the participation of types as subjects in paths of length 2 at position 2Compute betweenness of properties by counting the participation of properties in paths of length 3 in position 2.The combination of betweenness values and number of instances indicates the value of centrality of a type or property for a dataset. Central types andproperties are able to capture most of the knowledge expressed in a dataset.
  • Identify the properties used in a dataset triples, and model them through the class Property; Identify the types (classes or literals) of the subject and object resources of such triples, and model them through the class Type; Identify the typical paths that connect triples in a dataset, and model them through the class Path.We then define four properties describing PathElement: hasProperty, hasPosition, hasPathElementObjectType and hasPathElementSubjectType. Each Path is associated to an instance of PathOccurrencesInDataset, which indicates the observed number of occurrences of that path in a Dataset.We also define the concepts CentralType and CentralProperty, which identify the entities capturing most of the knowledge in a dataset. The type KnowledgePattern is used for storing the emerging knowledge patterns.
  • The # of pathsThe # of occurrences of paths of different lengthsIt was enough to compute paths of length 3 – for longer paths were concatanation of shorter pathsThere are a lot of multi typingWe don’t look only at the structurecompute betweenness of types by counting the participation of types as subjects in paths of length 2 at position 2Compute betweenness of properties by counting the participation of properties in paths of length 3 in position 2.The combination of betweenness values and number of instances indicates the value of centrality of a type or property for a dataset. Central types andproperties are able to capture most of the knowledge expressed in a dataset.
  • We want to build queries exploiting these propertiesThe central type based on value of betweennessThere are paths of length 2 where this type is in the center
  • We have to collect path elements
  • We start with an empty setWe identify a central TYPE mo:track
  • We take the KP – we take the max set of properties
  • We feed the set
  • So if there are any central properties we look at the cluster
  • So we take everything that is related to track – the result is that feed with
  • A procedure for building prototypical queries without prior knowledge of a dataset vocabulary
  • Transcript of "Wai 2011"

    1. 1. Extracting core knowledge from Linked Data Valentina Presutti, Lora Aroyo, Alessandro Adamou, Balthasar Schopman, Aldo Gangemi, Guus SchreiberSecond International Workshop on Consuming Linked Data (COLD2011) Bonn, Germany, 2011-10-23
    2. 2. Our initial goal was to designrecommendation strategies basedon linked data, e.g. for TVprograms
    3. 3. Use cases Dataset Jamendo JPeel LMDB nTriples 1,047,950 271,369 6,147,978 nProps 24 24 221 Jamendo nTypes 11 9 53 John Peel Sessions LinkedMDBDifferent in sizeSame or related domainsCustom/standard/popular vocabularies Measure Dataset L=2 L=3 L=4 nPath Jamendo 33 31 26 Jpeel 56 65 73 LMDB 546 1,665 3,757 nPathOcc Jamendo 999,052 1,452,645 2,259,097 Jpeel 1,948,999 14,447,400 1,240,815,607 LMDB 25,765,513 184,950,315 1,402,705,472 Property usage in paths Jamendo 1.000 0.917 0.834 Jpeel 0.917 0.834 0.792 LMDB 0.847 0.747 0.747
    4. 4. Conclusion, ongoing and future workConclusion• Linked Data sets as connected knowledge components (KA) http://www.ontologydesignpatterns.org/ont/lod-analysis-properties.owl• Empirical analysis on three LD datasets from the LD cloud http://stlab.istc.cnr.it/stlab/LOD-Analysis-Intro• A procedure for building prototypical queries when the ontologies are unknownOngoing and future work• KA has been used for extracting KPs from Wikipedia wikilinks – “Encyclopedic Knowledge Patterns from Wikipedia links” on Wednesday  – Aemoo: exploratory search demo based on KPs @SWC• Improving KA expressivity and effectiveness• Refining the procedure for extracting KPs by exploiting centrality• Refining and evaluating automatic query building based on cognitive-soundness• Comparing KA of a dataset with the “top-down” ontologies used in it• Aligning emerging KPs to general KPs: supporting alignment between datasets and error discovering
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×