Slides SEMAPRO 2016 University of Oviedo

Inference and Serialization
of Latent Graph Schemata
Using Shex
Speaker: Daniel Fernández-Álvarez
Category: Idea
Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González*
danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com
*Department of Computer Science
WESO Research Group
University of Oviedo
Oviedo, Spain
Motivational
example
Motivation: Torimbia Beach
Motivation: Torimbia Beach
• Country: Spain
• Region: Asturias
• Council/city: Llanes
• Lat/long: 43.44, -4.85
• Length: 500 m
• Width: 100 m
• Naturist: True
Motivation: Torimbia Beach
*Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach
Region Lat/long Width
X
X
X
X
X
6 different random but relevant beaches in DBPedia*
The same happens with country, council/city, length and naturist
Motivation
I would like to…
check the concept of beach, not the instances
make a single query/click to discover usual schemata
be correct, coherent and exhaustive
Idea
Proposal
• Analysis of the neighborhood of nodes that fit in a certain condition
to induce usual schemata:
• Typical condition: rdf:type
• Serialization of inferred schemata with ShEx (Shape Expressions).
• Association to a type (class)
• Management of trustworthiness
• Handy for:
• Documentation
• Verification of quality
• Discovering “hidden” entities
How?
Workflow
ShEx
<Person> {
}
Source graph:
Dbpedia,
Wikidata…
Inference Serialization
Abstract
schemata
representation
Textual schemata
representation
with ShEx
Schemata Inference: current approaches
• Ontology integration to find shared core elements [Zhao,13]
• Association rule mining (Apriori)
• Rule-based classification (Decision Tables)
• Logical axioms at ontology level [Völker,11]
• Association rule mining (Apriori)
• Axioms represented with OWL 2 EL
• Graph schemata al class level[Christodoulou,15]
• Clusters of similar individuals (ideally, cluster=class).
• Results in an ad-hoc syntax.
Schemata Inference: our current status
Some promising ideas:
Instance clustering
Association rule mining
Some issues linked to the target graph:
Noise management
Adaptation to data model
Graph size & complexity
Completeness and coherence
Schemata Serialization I
Need: Standard syntax to express constraints in RDF graphs at class
level:
• XML: RelaxNG, DTD, Xml Schema
• Relational databases: DDL
• Json: Json Schema
RDF candidates:
ShEx
Grammar-oriented
Recursion
Human-friendly syntax
SHACL
Constraint-oriented
No recursion (by now)
RDF syntax (by now)
19%
59%
83%
83%
87%
69%
32%
Schemata Serialization II
Pure ShEx
<Beach> {
dbp:width xsd:integer,
dbp:length xsd:integer,
geo:lat xsd:long,
geo:long xsd:long,
dbo:isPartOf @<Place>*
}
Anotated ShEx
<Beach> {
dbp:width xsd:integer,
dbp:length xsd:integer,
geo:lat xsd:long,
geo:long xsd:long,
geo:geometry @<Point>,
dbo:isPartOf @<Place>*,
dbo:country @<Country>
}
Use cases?
Context: Types of graphs
Specific purpose
Automatically built
Managed by a single agent
General purpose
Manually built
Managed by community
Reality
Context: Collaborative graphs
Key points:
• Schemata are not planned, they just emerge
• Schemata change in time
Posibilities:
• Schemata inference on users’ demand
• What is associated to a type, instead of how a type should be
• Freedom: ShEx as guide, not dogma
To summarize…
Conclusions and Future Work
What we have done:
Idea
Inference of Latent Graph Schemata
Serialization through ShEx syntax
What we want to do:
Prototype
Selection of techniques
Selection of target source/s
Tests
Usefulness in different domains
Feasibility: reached trustworthiness
User’s acceptance
References
• Zhao, L., & Ichise, R. (2013, May). Instance-based ontological
knowledge acquisition. In Extended Semantic Web Conference (pp.
155-169). Springer Berlin Heidelberg.
• [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction.
In Extended Semantic Web Conference (pp. 124-138). Springer Berlin
Heidelberg.
• [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015).
Structure inference for linked data sources using clustering.
In Transactions on Large-Scale Data-and Knowledge-Centered
Systems XIX (pp. 1-25). Springer Berlin Heidelberg.
Inference and Serialization
of Latent Graph Schemata
Using Shex
Speaker: Daniel Fernández-Álvarez
Category: Idea
Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González*
danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com
*Department of Computer Science
WESO Research Group
University of Oviedo
Oviedo, Spain
Extra information for Torimbia example I
Latlong* Naturist
Batu Ferringhi
dbp:latd, dbp:longd, georss:point,
geo:geometry, geo:lat, geo:long X
Horseshoe Bay geo:geometry, geo:lat, geo:long X
Manly Beach
georss:point, geo:geometry, geo:lat,
geo:long X
Marina Beach
georss:point, geo:geometry, geo:lat,
geo:long X
Playa Arcadia
georss:point, geo:geometry, geo:lat,
geo:long X
Red Beach
dbp:latDeg, dbp:longDeg, georss:point,
geo:geometry, geo:lat, geo:long X
*Some lat/long properties has been omitted. Some of them work togheter in order to
get a precise coordinate (total degrees + orientation N/S E/W)
Extra information for Torimbia example II
Lenght Width Council Region Country
Batu
Ferringhi X X shared entity dbo:isPartOf dbo:country
Horseshoe
Bay X X description description
rdf:type
(BeachesOfBer
muda)
Manly Beach X X description
dct:subject
dbc:Beaches_of_N
ew_South_Wales description
Marina
Beach dbp:height description dct:subject dct:subject
Playa ArcadiaX X dct:subject X dct:subject
Red Beach X dbp:width dbp:city is dbp:south of description
Wikimedia Strategy: Templates and Mappings
• Mappings
• Designed to automatically import data from Wikipedia’s infoboxes and tables
into DBpedia.
• Wikipedia Templates define expected properties for certain types. Mappings
define which property should be used to create a triple when finding an
occurrence of an expected property.
PROS
• Preserves Wikipedia’s quality.
• Handy as guide for content
represented in Wikipedia.
• It may enrich both Wikipedia and
DBpedia
• Templates can evolve guided by
community
CONS
• Depends on Wikipedia’s quality.
• It can only manage content
represented in Wikipedia.
• Non transposable to standalone RDF
graph projects.
• It assumes that the community is
following the templates. It may not
reflect the real graph.
ShEx vs SHACL
ShEx
<UserShape> {
dbp:label xsd:string,
ex:role ( ex:User ) ?
}
SHACL
:UserShape
a sh:Shape ;
sh:property [
sh:predicate rdfs:label ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:predicate ex:role ;
sh:hasValue ex:User ;
sh:filterShape [
sh:property [
sh:predicate ex:role ;
sh:minCount 1 ;
]
] ;
sh:maxCount 1 ; ] .
1 of 25

Recommended

Mugnone Improvement by
Mugnone ImprovementMugnone Improvement
Mugnone ImprovementLuigi Rosa, MSCE, CCM, PMP
77 views1 slide
5 spooky halloween songs you should listen to by
5 spooky halloween songs you should listen to5 spooky halloween songs you should listen to
5 spooky halloween songs you should listen toVicki Williams
624 views7 slides
ARE AUTHORIZATION by
ARE AUTHORIZATIONARE AUTHORIZATION
ARE AUTHORIZATIONChris Kowalski
100 views1 slide
Participación ciudadana by
Participación ciudadanaParticipación ciudadana
Participación ciudadanaAlejandra Ocampo Castaño
121 views10 slides
HIST 4452 paper by
HIST 4452 paperHIST 4452 paper
HIST 4452 paperJennifer Talley
274 views10 slides

More Related Content

Viewers also liked

4.unidad didactica primer grado by
4.unidad didactica primer grado4.unidad didactica primer grado
4.unidad didactica primer gradodavid quispe
273 views5 slides
Actividad 1 humanidades by
Actividad 1 humanidadesActividad 1 humanidades
Actividad 1 humanidadesDavid Guzman
275 views8 slides
Tecnologiaaa by
TecnologiaaaTecnologiaaa
TecnologiaaaDaniela Chica
146 views40 slides
Bing Ads by
Bing AdsBing Ads
Bing AdsHeinrich Muller
82 views1 slide
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4 by
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4Giuliano Ganassi
150 views3 slides
Teoria de geometria euclideana by
Teoria de geometria euclideanaTeoria de geometria euclideana
Teoria de geometria euclideanaDavid Guzman
1K views33 slides

Viewers also liked(12)

4.unidad didactica primer grado by david quispe
4.unidad didactica primer grado4.unidad didactica primer grado
4.unidad didactica primer grado
david quispe273 views
Actividad 1 humanidades by David Guzman
Actividad 1 humanidadesActividad 1 humanidades
Actividad 1 humanidades
David Guzman275 views
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4 by Giuliano Ganassi
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4
Campionati giovanili di pallavolo CSI Vallecamonica - Comunicato N°4
Giuliano Ganassi150 views
Teoria de geometria euclideana by David Guzman
Teoria de geometria euclideanaTeoria de geometria euclideana
Teoria de geometria euclideana
David Guzman1K views
130103 fbgis 2008_2012 by Fernando Gil
130103 fbgis 2008_2012130103 fbgis 2008_2012
130103 fbgis 2008_2012
Fernando Gil616 views
Redação oficial e pronomes de tratamento by luisinhow
Redação oficial e pronomes de tratamentoRedação oficial e pronomes de tratamento
Redação oficial e pronomes de tratamento
luisinhow13K views
Bm examination by ayeayetun08
Bm examinationBm examination
Bm examination
ayeayetun0813.8K views
Marketing by moneflau
MarketingMarketing
Marketing
moneflau67.1K views

More from Daniel Fernández Álvarez

Mini tutorial rdflib by
Mini tutorial rdflibMini tutorial rdflib
Mini tutorial rdflibDaniel Fernández Álvarez
287 views20 slides
Wikidata: qué es y cómo subirse al carro by
Wikidata: qué es y cómo subirse al carroWikidata: qué es y cómo subirse al carro
Wikidata: qué es y cómo subirse al carroDaniel Fernández Álvarez
35 views44 slides
Presentation shexer by
Presentation shexerPresentation shexer
Presentation shexerDaniel Fernández Álvarez
1.2K views18 slides
Wikidata intro by
Wikidata introWikidata intro
Wikidata introDaniel Fernández Álvarez
52 views21 slides
Presentation ClassRank WikidataCon 2017 by
Presentation ClassRank WikidataCon 2017Presentation ClassRank WikidataCon 2017
Presentation ClassRank WikidataCon 2017Daniel Fernández Álvarez
114 views22 slides
Presentation to KILT by
Presentation to KILTPresentation to KILT
Presentation to KILTDaniel Fernández Álvarez
117 views11 slides

Recently uploaded

predicting-m3-devopsconMunich-2023.pptx by
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxTier1 app
8 views24 slides
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Lisi Hocke
35 views124 slides
Flask-Python.pptx by
Flask-Python.pptxFlask-Python.pptx
Flask-Python.pptxTriloki Gupta
9 views12 slides
The Era of Large Language Models.pptx by
The Era of Large Language Models.pptxThe Era of Large Language Models.pptx
The Era of Large Language Models.pptxAbdulVahedShaik
7 views9 slides
Understanding HTML terminology by
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminologyartembondar5
7 views8 slides
Page Object Model by
Page Object ModelPage Object Model
Page Object Modelartembondar5
6 views5 slides

Recently uploaded(20)

predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app8 views
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium... by Lisi Hocke
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Team Transformation Tactics for Holistic Testing and Quality (Japan Symposium...
Lisi Hocke35 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar57 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254558 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 5 views
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation by HCLSoftware
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook AutomationDRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
DRYiCE™ iAutomate: AI-enhanced Intelligent Runbook Automation
HCLSoftware6 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS10 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino7 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan7 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67025 views

Slides SEMAPRO 2016 University of Oviedo

  • 1. Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain
  • 4. Motivation: Torimbia Beach • Country: Spain • Region: Asturias • Council/city: Llanes • Lat/long: 43.44, -4.85 • Length: 500 m • Width: 100 m • Naturist: True
  • 5. Motivation: Torimbia Beach *Batu Ferringhi, Horseshoe Bay, Manly Beach, Marina Beach, Playa Arcadia, Red Beach Region Lat/long Width X X X X X 6 different random but relevant beaches in DBPedia* The same happens with country, council/city, length and naturist
  • 6. Motivation I would like to… check the concept of beach, not the instances make a single query/click to discover usual schemata be correct, coherent and exhaustive
  • 8. Proposal • Analysis of the neighborhood of nodes that fit in a certain condition to induce usual schemata: • Typical condition: rdf:type • Serialization of inferred schemata with ShEx (Shape Expressions). • Association to a type (class) • Management of trustworthiness • Handy for: • Documentation • Verification of quality • Discovering “hidden” entities
  • 10. Workflow ShEx <Person> { } Source graph: Dbpedia, Wikidata… Inference Serialization Abstract schemata representation Textual schemata representation with ShEx
  • 11. Schemata Inference: current approaches • Ontology integration to find shared core elements [Zhao,13] • Association rule mining (Apriori) • Rule-based classification (Decision Tables) • Logical axioms at ontology level [Völker,11] • Association rule mining (Apriori) • Axioms represented with OWL 2 EL • Graph schemata al class level[Christodoulou,15] • Clusters of similar individuals (ideally, cluster=class). • Results in an ad-hoc syntax.
  • 12. Schemata Inference: our current status Some promising ideas: Instance clustering Association rule mining Some issues linked to the target graph: Noise management Adaptation to data model Graph size & complexity Completeness and coherence
  • 13. Schemata Serialization I Need: Standard syntax to express constraints in RDF graphs at class level: • XML: RelaxNG, DTD, Xml Schema • Relational databases: DDL • Json: Json Schema RDF candidates: ShEx Grammar-oriented Recursion Human-friendly syntax SHACL Constraint-oriented No recursion (by now) RDF syntax (by now)
  • 14. 19% 59% 83% 83% 87% 69% 32% Schemata Serialization II Pure ShEx <Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, dbo:isPartOf @<Place>* } Anotated ShEx <Beach> { dbp:width xsd:integer, dbp:length xsd:integer, geo:lat xsd:long, geo:long xsd:long, geo:geometry @<Point>, dbo:isPartOf @<Place>*, dbo:country @<Country> }
  • 16. Context: Types of graphs Specific purpose Automatically built Managed by a single agent General purpose Manually built Managed by community Reality
  • 17. Context: Collaborative graphs Key points: • Schemata are not planned, they just emerge • Schemata change in time Posibilities: • Schemata inference on users’ demand • What is associated to a type, instead of how a type should be • Freedom: ShEx as guide, not dogma
  • 19. Conclusions and Future Work What we have done: Idea Inference of Latent Graph Schemata Serialization through ShEx syntax What we want to do: Prototype Selection of techniques Selection of target source/s Tests Usefulness in different domains Feasibility: reached trustworthiness User’s acceptance
  • 20. References • Zhao, L., & Ichise, R. (2013, May). Instance-based ontological knowledge acquisition. In Extended Semantic Web Conference (pp. 155-169). Springer Berlin Heidelberg. • [2] Völker, J., & Niepert, M. (2011, May). Statistical schema induction. In Extended Semantic Web Conference (pp. 124-138). Springer Berlin Heidelberg. • [3] Christodoulou, K., Paton, N. W., & Fernandes, A. A. (2015). Structure inference for linked data sources using clustering. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XIX (pp. 1-25). Springer Berlin Heidelberg.
  • 21. Inference and Serialization of Latent Graph Schemata Using Shex Speaker: Daniel Fernández-Álvarez Category: Idea Daniel Fernández-Álvarez* Jose Emilio Labra-Gayo* Herminio García-González* danifdezalvarez@gmail.com labra@uniovi.es herminiogg@gmail.com *Department of Computer Science WESO Research Group University of Oviedo Oviedo, Spain
  • 22. Extra information for Torimbia example I Latlong* Naturist Batu Ferringhi dbp:latd, dbp:longd, georss:point, geo:geometry, geo:lat, geo:long X Horseshoe Bay geo:geometry, geo:lat, geo:long X Manly Beach georss:point, geo:geometry, geo:lat, geo:long X Marina Beach georss:point, geo:geometry, geo:lat, geo:long X Playa Arcadia georss:point, geo:geometry, geo:lat, geo:long X Red Beach dbp:latDeg, dbp:longDeg, georss:point, geo:geometry, geo:lat, geo:long X *Some lat/long properties has been omitted. Some of them work togheter in order to get a precise coordinate (total degrees + orientation N/S E/W)
  • 23. Extra information for Torimbia example II Lenght Width Council Region Country Batu Ferringhi X X shared entity dbo:isPartOf dbo:country Horseshoe Bay X X description description rdf:type (BeachesOfBer muda) Manly Beach X X description dct:subject dbc:Beaches_of_N ew_South_Wales description Marina Beach dbp:height description dct:subject dct:subject Playa ArcadiaX X dct:subject X dct:subject Red Beach X dbp:width dbp:city is dbp:south of description
  • 24. Wikimedia Strategy: Templates and Mappings • Mappings • Designed to automatically import data from Wikipedia’s infoboxes and tables into DBpedia. • Wikipedia Templates define expected properties for certain types. Mappings define which property should be used to create a triple when finding an occurrence of an expected property. PROS • Preserves Wikipedia’s quality. • Handy as guide for content represented in Wikipedia. • It may enrich both Wikipedia and DBpedia • Templates can evolve guided by community CONS • Depends on Wikipedia’s quality. • It can only manage content represented in Wikipedia. • Non transposable to standalone RDF graph projects. • It assumes that the community is following the templates. It may not reflect the real graph.
  • 25. ShEx vs SHACL ShEx <UserShape> { dbp:label xsd:string, ex:role ( ex:User ) ? } SHACL :UserShape a sh:Shape ; sh:property [ sh:predicate rdfs:label ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ] ; sh:property [ sh:predicate ex:role ; sh:hasValue ex:User ; sh:filterShape [ sh:property [ sh:predicate ex:role ; sh:minCount 1 ; ] ] ; sh:maxCount 1 ; ] .