Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Slides SEMAPRO 2016 University of Oviedo

117 views

Published on

Slides used to present the work "Inference and Serialization of Latent Graph Schemata Using ShEx" in SEMAPRO 2016

Published in: Software
  • Be the first to comment

  • Be the first to like this

Slides SEMAPRO 2016 University of Oviedo

  1. 1. Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain
  2. 2. Motivational example
  3. 3. Motivation: Torimbia Beach
  4. 4. Motivation: Torimbia Beach • Country: Spain • Region: Asturias • Council/city: Llanes • Lat/long: 43.44, -4.85 • Length: 500 m • Width: 100 m • Naturist: True
  5. 5. Motivation: Torimbia Beach *Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach Region Lat/long Width X X X X X 6 different random but relevant beaches in DBPedia* The same happens with country, council/city, length and naturist
  6. 6. Motivation I would like to… check the concept of beach, not the instances make a single query/click to discover usual schemata be correct, coherent and exhaustive
  7. 7. Idea
  8. 8. Proposal • Analysis of the neighborhood of nodes that fit in a certain condition to induce usual schemata: • Typical condition: rdf:type • Serialization of inferred schemata with ShEx (Shape Expressions). • Association to a type (class) • Management of trustworthiness • Handy for: • Documentation • Verification of quality • Discovering “hidden” entities
  9. 9. How?
  10. 10. Workflow ShEx <Person> { } Source graph: Dbpedia, Wikidata… Inference Serialization Abstract schemata representation Textual schemata representation with ShEx
  11. 11. Schemata Inference: current approaches • Ontology integration to find shared core elements [Zhao,13] • Association rule mining (Apriori) • Rule-based classification (Decision Tables) • Logical axioms at ontology level [Völker,11] • Association rule mining (Apriori) • Axioms represented with OWL 2 EL • Graph schemata al class level[Christodoulou,15] • Clusters of similar individuals (ideally, cluster=class). • Results in an ad-hoc syntax.
  12. 12. Schemata Inference: our current status Some promising ideas: Instance clustering Association rule mining Some issues linked to the target graph: Noise management Adaptation to data model Graph size & complexity Completeness and coherence
  13. 13. Schemata Serialization I Need: Standard syntax to express constraints in RDF graphs at class level: • XML: RelaxNG, DTD, Xml Schema • Relational databases: DDL • Json: Json Schema RDF candidates: ShEx Grammar-oriented Recursion Human-friendly syntax SHACL Constraint-oriented No recursion (by now) RDF syntax (by now)
  14. 14. 19% 59% 83% 83% 87% 69% 32% Schemata Serialization II Pure ShEx <Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, dbo:isPartOf @<Place>* } Anotated ShEx <Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, geo:geometry @<Point>, dbo:isPartOf @<Place>*, dbo:country @<Country> }
  15. 15. Use cases?
  16. 16. Context: Types of graphs Specific purpose Automatically built Managed by a single agent General purpose Manually built Managed by community Reality
  17. 17. Context: Collaborative graphs Key points: • Schemata are not planned, they just emerge • Schemata change in time Posibilities: • Schemata inference on users’ demand • What is associated to a type, instead of how a type should be • Freedom: ShEx as guide, not dogma
  18. 18. To summarize…
  19. 19. Conclusions and Future Work What we have done: Idea Inference of Latent Graph Schemata Serialization through ShEx syntax What we want to do: Prototype Selection of techniques Selection of target source/s Tests Usefulness in different domains Feasibility: reached trustworthiness User’s acceptance
  20. 20. References • Zhao, L., & Ichise, R. (2013, May). Instance-based ontological knowledge acquisition. In Extended Semantic Web Conference (pp. 155-169). Springer Berlin Heidelberg. • [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction. In Extended Semantic Web Conference (pp. 124-138). Springer Berlin Heidelberg. • [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015). Structure inference for linked data sources using clustering. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XIX (pp. 1-25). Springer Berlin Heidelberg.
  21. 21. Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain
  22. 22. Extra information for Torimbia example I Latlong* Naturist Batu Ferringhi dbp:latd, dbp:longd, georss:point, geo:geometry, geo:lat, geo:long X Horseshoe Bay geo:geometry, geo:lat, geo:long X Manly Beach georss:point, geo:geometry, geo:lat, geo:long X Marina Beach georss:point, geo:geometry, geo:lat, geo:long X Playa Arcadia georss:point, geo:geometry, geo:lat, geo:long X Red Beach dbp:latDeg, dbp:longDeg, georss:point, geo:geometry, geo:lat, geo:long X *Some lat/long properties has been omitted. Some of them work togheter in order to get a precise coordinate (total degrees + orientation N/S E/W)
  23. 23. Extra information for Torimbia example II Lenght Width Council Region Country Batu Ferringhi X X shared entity dbo:isPartOf dbo:country Horseshoe Bay X X description description rdf:type (BeachesOfBer muda) Manly Beach X X description dct:subject dbc:Beaches_of_N ew_South_Wales description Marina Beach dbp:height description dct:subject dct:subject Playa ArcadiaX X dct:subject X dct:subject Red Beach X dbp:width dbp:city is dbp:south of description
  24. 24. Wikimedia Strategy: Templates and Mappings • Mappings • Designed to automatically import data from Wikipedia’s infoboxes and tables into DBpedia. • Wikipedia Templates define expected properties for certain types. Mappings define which property should be used to create a triple when finding an occurrence of an expected property. PROS • Preserves Wikipedia’s quality. • Handy as guide for content represented in Wikipedia. • It may enrich both Wikipedia and DBpedia • Templates can evolve guided by community CONS • Depends on Wikipedia’s quality. • It can only manage content represented in Wikipedia. • Non transposable to standalone RDF graph projects. • It assumes that the community is following the templates. It may not reflect the real graph.
  25. 25. ShEx vs SHACL ShEx <UserShape> { dbp:label xsd:string, ex:role ( ex:User ) ? } SHACL :UserShape a sh:Shape ; sh:property [ sh:predicate rdfs:label ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ] ; sh:property [ sh:predicate ex:role ; sh:hasValue ex:User ; sh:filterShape [ sh:property [ sh:predicate ex:role ; sh:minCount 1 ; ] ] ; sh:maxCount 1 ; ] .

×