SlideShare a Scribd company logo
1 of 27
Download to read offline
Knowledge graph construction with a façade
The SPARQL Anything Project
Enrico Daga - The Open University
Knowledge Graph Construction W3C Community Group
Invited talk, online, 14/02/2022
Luigi Asprino
University of Bologna
Enrico Daga
The Open University
Aldo Gangemi
University of Bologna
Justin Dowdy
https://github.com/
justin2004
Paul Warren
The Open University
Paul Mulholland
The Open University
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement GA101004746.
The communication reflects only the author’s view and the Research Executive Agency is not
responsible for any use that may be made of the information it contains.
Credits
Playing the soundtrack of our history
Preserving musical heritage

through knowledge graphs
Managing musical heritage collections

through knowledge graphs
Studying musical heritage through

(interlinked) knowledge graphs
https://spice-h2020.eu/ https://polifonia-project.eu/
Rationale
• Data Integration is the dominant use case for KG - [Atkin, 2021, in Lassila et al,
2021].
• SPARQL is the language for RDF KGs.
• 42% SPARQL users are from non-IT areas, including social sciences and the
humanities [Warren et al, 2018].
• Many SPARQL practitioners are end-user developers [Lieberman, 2006].
• Minimise the tools / languages that need to be learned.
• SPICE / Polifonia: parallelise KG construction (ontology design / data lifting)
• Enable data lifting while waiting for a domain ontology (to come).
• SPICE / Polifonia: data may come from anywhere!
• Support the addition of an open-ended set of formats.
Knowledge Graph Construction from structured resources
Iterative process:
• Observe: the resource (e.g. a CSV file)
• Design mappings to a target ontology
• Transform: execute the mappings
• Observe: compare / evaluate
Trail and error approach, many iterations
Knowledge Graph Construction
an opinionated approach
• Reengineering: what syntax/meta-model do we
want?
• We cannot know what structure our user
wants but we know the meta-model: RDF
• Remodelling: what semantics do we project?
• SPARQL is great for projecting semantics
(change namespaces, create entities from
literals, adding types, sophisticated
relationships, composite structures, …)
• Can we use just SPARQL to do all of it?
@enridaga
Concept
Facade Design Pattern (GoF)
From Object Oriented Programming
A single abstraction on different, alternative interfaces
https://en.wikipedia.org/wiki/Facade_pattern
Research question: what RDF facade?
• A common RDF structure over diverse
formats
• Focusing on the meta-model (data structure)
• Leaving domain semantics as-it-is!
• applying the least possible “ontological
commitment”
• Problem Space: CSV, JSON, HTML, XML,
Binary (JPEG, PNG, …),Text
• Solution space: RDFS
CSV
• Resource
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:root a rdfs:Class .
id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093
…
https://github.com/tategallery/collection/blob/master/artist_data.csv
[ a fx:root ;
rdf:_1 [ xyz:dates "born 1930" ;
xyz:gender "Female" ;
xyz:id "10093" ;
xyz:name "Abakanowicz, Magdalena" ;
xyz:placeOfBirth "Polska" ;
xyz:placeOfDeath "" ;
xyz:url "http://www.tate.org.uk/art/artists/magdalena-
abakanowicz-10093" ;
xyz:yearOfBirth "1930" ;
xyz:yearOfDeath ""
] ;
csv.headers=true|false
[ a fx:root ;
rdf:_1 [ rdf:_1 "id" ;
rdf:_2 "name" ;
rdf:_3 "gender" ;
rdf:_4 "dates" ;
rdf:_5 "yearOfBirth" ;
rdf:_6 "yearOfDeath" ;
rdf:_7 "placeOfBirth" ;
rdf:_8 “placeOfDeath" ;
rdf:_9 "url"
] ;
@enridaga
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
JSON
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:root a rdfs:Class .
xsd:int a rdfs:Datatype.
xsd:string a rdfs:Datatype.
xsd:boolean a rdfs:Datatype.
xsd:decimal a rdfs:Datatype.
xsd:float a rdfs:Datatype.
xsd:double a rdfs:Datatype.
https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json
[ a fx:root ;
xyz:acno "T02319" ;
xyz:acquisitionYear "1978"^^<http://www.w3.org/2001/XMLSchema#int> ;
xyz:all_artists "Kazimir Malevich" ;
xyz:catalogueGroup […] ;
xyz:classification "painting" ;
xyz:contributorCount "1"^^<http://www.w3.org/2001/XMLSchema#int> ;
…
{
"acno": "T02319",
"acquisitionYear": 1978,
"all_artists": "Kazimir Malevich",
"catalogueGroup": {},
"classification": "painting",
"contributorCount": 1,
"contributors": [
{
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
DOM (HTML, XML, …)
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:root a rdfs:Class .
xsd:int a rdfs:Datatype.
xsd:string a rdfs:Datatype.
xsd:boolean a rdfs:Datatype.
xsd:decimal a rdfs:Datatype.
xsd:float a rdfs:Datatype.
xsd:double a rdfs:Datatype.
rdf:type rdf:type rdf:Property
https://imma.ie/artists/
[ a fx:root , xhtml:div ;
xhtml:id “az-group” ;
rdf:_1 [ a xhtml:div ;
rdf:_1 [ a xhtml:h4 ;
rdf:_1 "A" ;
<https://html.spec.whatwg.org/#innerHTML>
"A" ;
<https://html.spec.whatwg.org/#innerText>
"A"
] ;
…
html.selector=#az-group
@prefix xhtml: <http://www.w3.org/1999/xhtml#> .
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
Binary and Text
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:root a rdfs:Class .
xsd:int a rdfs:Datatype.
xsd:string a rdfs:Datatype.
xsd:boolean a rdfs:Datatype.
xsd:decimal a rdfs:Datatype.
xsd:float a rdfs:Datatype.
xsd:double a rdfs:Datatype.
xsd:base64Binary a rdfs:Datatype.
rdf:type rdf:type rdf:Property
[ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> “/9j/
4AAQSkZJRgABAQEASABIAAD/
4QmsRXhpZgAASUkqAAgAAAALAA8BAgAGAAAAkgAAABABAgAOAAAAmAAAABIBAw
ABAAAAAQAAABoBBQABAAAApgAAABsBBQABAAAArgAAACgBAwABAAAAAgAAADEB
AgALAAAAtgAAADIBAgAUAAAAwgAAABMCAwABAAAAAgAAAGmHBAABAAAA1gAAAC
WIBAABAAAA0gMAAOQDAABDYW5vbgBDYW5vbiBFT1MgNDBEAEgAAAABAAAASAAA
AAEAAABHSU1QIDIuNC41AAAyMDA4OjA3OjMxIDEwOjM4OjExAB4Am…”^^<http
://www.w3.org/2001/XMLSchema#base64Binary> ] .
bin.encoding # BASE64
txt.regex # tokenise into a sequence
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
https://imma.ie/collection/freeing-the-voice/
Hello World! [ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "Hello World!" ] .
Facade X
A simplified RDF meta-model, resembling a list-of-lists
Components: Containers (typed), slots (int / string), values
Intuitive, abstract notions: key-value, sequence, type
“String”, 1, true,…
“String”, 1, true
xyz:row1, …
xyz:row_n, xyz:…
fx:root | xyz:*
rdf:type
xyz:*
rdf:_N
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
CSV XML JSON
FX
FX
SPARQL
Daga, Enrico; Asprino, Luigi; Mulholland, Paul and
Gangemi, Aldo (2021). 

Facade-X: An Opinionated Approach to SPARQL
Anything. 

In: Alam, Mehwish; Groth, Paul; de Boer, Victor;
Pellegrini, Tassilo and Pandit, Harshvardhan J. eds.
Volume 53: Further with Knowledge Graphs, Volume 53.
IOS Press, pp. 58–73.
https://sparql-anything.cc/
(3) Project on target ontology
(1) Select input data
(2) Build entities
https://
github.com/
SPARQL-
Anything/
showcase-tate
Tate Gallery Open
Data
* CSV listing artworks
* JSON with details
Task: build a SKOS
taxonomy of artwork
subjects
@enridaga
Polifonia Ontology Network
* Scenarios collected on GitHub as
Markdown files
Task: extract a list of competency
questions from any scenario
https://github.com/SPARQL-Anything/showcase-mei
XML->CSV: using SPARQL to extract the note sequence from a XML/MEI file
https://github.com/
SPARQL-Anything/
showcase-imma
https://imma.ie/collection/freeing-the-memory/
Features v0.6.0
https://sparql-anything.readthedocs.io/en/latest/
• XML, JSON, CSV, HTML, Excel, Text, Binary, EXIF, File System, Zip/Tar,
Markdown, YAML, Bibtex, DOCx
• Query templates / parameter queries (BASIL variables)
• Full fledged HTTP client to query Web APIs (headers, auth, …)
• Helper functions for sequences: fx:anySlot, fx:before, fx:after, …
• Combine and nest SERVICE clauses (thanks to SPARQL)
• Use SPARQL Results Sets as input for parametric queries
• Large files (CSV): iterator-like execution style
• Function: fx:entity
• On-disk option (TDB2)
• Command line interface [or] Server (based on Apache Jena Fuseki)
• 100% open source, Apache Licence 2.0, relies on Apache Jena ARQ
https://sparql-anything.cc/
Benefits
Compared to RML-based solutions and SPARQL Generate
• Plain SPARQL 1.1 (vision: minimise the need for other formalisms)
• “Free lunch” data exploration and querying
• FX can express ANY format representable as BNF (article under review)
• Open-ended extendibility: no changes to user-facing code required
• No need to commit to a specific ontology, just query the data!
• Low cognitive complexity — lower number of distinct tokens or variables (Halford et al. 2004;
Warren et al. 2015).
• Sustainable: lines of Java code to maintain: SPARQL Generate 12280 (core module); RML
Mapper 7951; SPARQL Anything: 3842 (all transformers) — v0.2.0 (v0.6.0 has 11115)
Work in progress
• No support for RDB or other databases yet (e.g. MongoDB)
• Facade-X is sufficient to represent relational data (article under review)
• User study on the cognitive process behind KGC, comparing RML mappings vs
Facade-X (Paul Warren)
• Performance scales linearly to the size of the input data, however, this may hit
memory limits:
• Triple-filtering: reduces the amount of triples to be loaded in memory to only the
ones used in the query
• Can use ondisk triple store (TDB2)
• Slicing: apply the query to an iteration of input data (currently only CSV)
• Inspired by iterator based approaches (RML / S. Generate)
Research directions
• Properties of the Facade-X model
• Facade-X: one of (many possible) RDF profiles?
• More on execution strategies (e.g. query-rewriting to low-level structures)
• FX for Linked Data Wrappers / Virtual Knowledge Graphs on
heterogeneous sources (querying queries …)
• FX mappings reuse = falls back to SPARQL modularisation (open problem,
some work on federated querying / SPARQL + map reduce)
• Relational database (use Ontop with a configuration on-demand?)
• Support developers: how to help with new formats?
• User interfaces for FX query design
Get in touch!
SPARQL Anything is under active development!
https://sparql-anything.cc
GitHub: https://github.com/SPARQL-Anything/sparql.anything
enrico.daga@open.ac.uk
@enridaga
www.enridaga.net
• Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th
International Conference on Semantic Systems (2021)
• Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June
2021)
• Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan
Thompson: Graph? Yes! Which one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021)
• Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021)
• Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018)
• Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020)
• Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards.
In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019)
• Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of
heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014)
• García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping
languages for firsttime users. PeerJ Computer Science 6, e318 (2020)
• Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the
art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011)
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web
Conference. pp. 35– 50. Springer (2017)
• Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006)
• Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.
References
Low (cognitive) complexity
• One measure of complexity is the number of
(distinct) items or variables (Halford et al. 2004;
Warren et al. 2015). 
• 8 CQ (vs SPARQL Generate)
• What are the titles of the artworks attributed to “ANONIMO”?
• What are the titles of the artworks created in the 1935?
• …
• 4 transformations (vs RML and SPARQL Generate)
• Avg distinct tokens:
• SPARQL Anything: ~18
• SPARQL Generate: ~25 (∼39.72% more)
• RML: ~45 (∼150% more)
Practicable and sustainable
• Quantitative analysis of performance, to assess practicability
• In-Memory implementation (Naive)
• Execution time of q1-q12
• AVG on 10 executions
• Comparable to RML Mapper and SPARQL Generate on files
up to 1M JSON objects (~5M triples)
• In-Memory implementation scales linearly
• The approach is practicable
• Research on performance as future work
• Lines of Java code to maintain: SPARQL Generate 12280
(core module); RML Mapper 7951; SPARQL Anything: 3842
(all transformers) — v0.2.0 (v0.3.0 has ~11k)

More Related Content

What's hot

Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertations
singlish
 

What's hot (20)

Administrer un portail et gérer une collection dans HAL
Administrer un portail et gérer une collection dans HALAdministrer un portail et gérer une collection dans HAL
Administrer un portail et gérer une collection dans HAL
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 
Introduction to koha
Introduction to kohaIntroduction to koha
Introduction to koha
 
RDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM ContainersRDF Linked Data - Automatic Exchange of BIM Containers
RDF Linked Data - Automatic Exchange of BIM Containers
 
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
Optimizing DITA Content for Search Engine Optimization tekom tcworld 2016
 
Koha Library Management System presentation
Koha Library Management System presentationKoha Library Management System presentation
Koha Library Management System presentation
 
Modulo3
Modulo3Modulo3
Modulo3
 
Passer à Alma d’Ex Libris : projet de réinformatisation ? projet de bibliothè...
Passer à Alma d’Ex Libris : projet de réinformatisation ? projet de bibliothè...Passer à Alma d’Ex Libris : projet de réinformatisation ? projet de bibliothè...
Passer à Alma d’Ex Libris : projet de réinformatisation ? projet de bibliothè...
 
Evaluation of Digital Library
Evaluation of Digital LibraryEvaluation of Digital Library
Evaluation of Digital Library
 
How to learn HTML in 10 Days
How to learn HTML in 10 DaysHow to learn HTML in 10 Days
How to learn HTML in 10 Days
 
CCO (Cataloging Cultural Objects): Applying CCO
CCO (Cataloging Cultural Objects): Applying CCOCCO (Cataloging Cultural Objects): Applying CCO
CCO (Cataloging Cultural Objects): Applying CCO
 
Koha
KohaKoha
Koha
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
Digital Libray
Digital LibrayDigital Libray
Digital Libray
 
Tema 1
Tema 1Tema 1
Tema 1
 
Repositorios, exposiciones virtuales y estructuras de datos enlazados con Ome...
Repositorios, exposiciones virtuales y estructuras de datos enlazados con Ome...Repositorios, exposiciones virtuales y estructuras de datos enlazados con Ome...
Repositorios, exposiciones virtuales y estructuras de datos enlazados con Ome...
 
Networked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And DissertationsNetworked Digital Library Of Theses And Dissertations
Networked Digital Library Of Theses And Dissertations
 
Science Indicators & Mapping of Science by Aman Kr Kushwaha
Science Indicators & Mapping of Science by Aman Kr KushwahaScience Indicators & Mapping of Science by Aman Kr Kushwaha
Science Indicators & Mapping of Science by Aman Kr Kushwaha
 
Dspace software
Dspace softwareDspace software
Dspace software
 

Similar to Knowledge graph construction with a façade - The SPARQL Anything Project

Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Enrico Daga
 
Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,future
Edward Baker
 
IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013
Stuart Myles
 

Similar to Knowledge graph construction with a façade - The SPARQL Anything Project (20)

Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
The nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologiesThe nature.com ontologies portal: nature.com/ontologies
The nature.com ontologies portal: nature.com/ontologies
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015The Nature.com ontologies portal - Linked Science 2015
The Nature.com ontologies portal - Linked Science 2015
 
AAT LOD Microthesauri
AAT LOD MicrothesauriAAT LOD Microthesauri
AAT LOD Microthesauri
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Scratchpads past,present,future
Scratchpads past,present,futureScratchpads past,present,future
Scratchpads past,present,future
 
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
Lodstats: The Data Web Census Dataset. Kobe, Japan, 2016
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Linked Data in Learning Analytics Tools
Linked Data in Learning Analytics ToolsLinked Data in Learning Analytics Tools
Linked Data in Learning Analytics Tools
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013IPTC News in JSON Spring 2013
IPTC News in JSON Spring 2013
 
From ontology to wiki
From ontology to wikiFrom ontology to wiki
From ontology to wiki
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz LOD2 Webinar Series: CubeViz
LOD2 Webinar Series: CubeViz
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
 

More from Enrico Daga

More from Enrico Daga (16)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Recently uploaded

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 

Recently uploaded (20)

Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 

Knowledge graph construction with a façade - The SPARQL Anything Project

  • 1. Knowledge graph construction with a façade The SPARQL Anything Project Enrico Daga - The Open University Knowledge Graph Construction W3C Community Group Invited talk, online, 14/02/2022
  • 2. Luigi Asprino University of Bologna Enrico Daga The Open University Aldo Gangemi University of Bologna Justin Dowdy https://github.com/ justin2004 Paul Warren The Open University Paul Mulholland The Open University This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement GA101004746. The communication reflects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains. Credits
  • 3. Playing the soundtrack of our history Preserving musical heritage
 through knowledge graphs Managing musical heritage collections
 through knowledge graphs Studying musical heritage through
 (interlinked) knowledge graphs https://spice-h2020.eu/ https://polifonia-project.eu/
  • 4. Rationale • Data Integration is the dominant use case for KG - [Atkin, 2021, in Lassila et al, 2021]. • SPARQL is the language for RDF KGs. • 42% SPARQL users are from non-IT areas, including social sciences and the humanities [Warren et al, 2018]. • Many SPARQL practitioners are end-user developers [Lieberman, 2006]. • Minimise the tools / languages that need to be learned. • SPICE / Polifonia: parallelise KG construction (ontology design / data lifting) • Enable data lifting while waiting for a domain ontology (to come). • SPICE / Polifonia: data may come from anywhere! • Support the addition of an open-ended set of formats.
  • 5. Knowledge Graph Construction from structured resources Iterative process: • Observe: the resource (e.g. a CSV file) • Design mappings to a target ontology • Transform: execute the mappings • Observe: compare / evaluate Trail and error approach, many iterations
  • 6. Knowledge Graph Construction an opinionated approach • Reengineering: what syntax/meta-model do we want? • We cannot know what structure our user wants but we know the meta-model: RDF • Remodelling: what semantics do we project? • SPARQL is great for projecting semantics (change namespaces, create entities from literals, adding types, sophisticated relationships, composite structures, …) • Can we use just SPARQL to do all of it? @enridaga
  • 7. Concept Facade Design Pattern (GoF) From Object Oriented Programming A single abstraction on different, alternative interfaces https://en.wikipedia.org/wiki/Facade_pattern Research question: what RDF facade? • A common RDF structure over diverse formats • Focusing on the meta-model (data structure) • Leaving domain semantics as-it-is! • applying the least possible “ontological commitment” • Problem Space: CSV, JSON, HTML, XML, Binary (JPEG, PNG, …),Text • Solution space: RDFS
  • 8. CSV • Resource Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:root a rdfs:Class . id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url 10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093 … https://github.com/tategallery/collection/blob/master/artist_data.csv [ a fx:root ; rdf:_1 [ xyz:dates "born 1930" ; xyz:gender "Female" ; xyz:id "10093" ; xyz:name "Abakanowicz, Magdalena" ; xyz:placeOfBirth "Polska" ; xyz:placeOfDeath "" ; xyz:url "http://www.tate.org.uk/art/artists/magdalena- abakanowicz-10093" ; xyz:yearOfBirth "1930" ; xyz:yearOfDeath "" ] ; csv.headers=true|false [ a fx:root ; rdf:_1 [ rdf:_1 "id" ; rdf:_2 "name" ; rdf:_3 "gender" ; rdf:_4 "dates" ; rdf:_5 "yearOfBirth" ; rdf:_6 "yearOfDeath" ; rdf:_7 "placeOfBirth" ; rdf:_8 “placeOfDeath" ; rdf:_9 "url" ] ; @enridaga CSV JSON HTML XML Binary (JPEG, PNG, …) Text
  • 9. JSON Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:root a rdfs:Class . xsd:int a rdfs:Datatype. xsd:string a rdfs:Datatype. xsd:boolean a rdfs:Datatype. xsd:decimal a rdfs:Datatype. xsd:float a rdfs:Datatype. xsd:double a rdfs:Datatype. https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json [ a fx:root ; xyz:acno "T02319" ; xyz:acquisitionYear "1978"^^<http://www.w3.org/2001/XMLSchema#int> ; xyz:all_artists "Kazimir Malevich" ; xyz:catalogueGroup […] ; xyz:classification "painting" ; xyz:contributorCount "1"^^<http://www.w3.org/2001/XMLSchema#int> ; … { "acno": "T02319", "acquisitionYear": 1978, "all_artists": "Kazimir Malevich", "catalogueGroup": {}, "classification": "painting", "contributorCount": 1, "contributors": [ { CSV JSON HTML XML Binary (JPEG, PNG, …) Text
  • 10. DOM (HTML, XML, …) Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:root a rdfs:Class . xsd:int a rdfs:Datatype. xsd:string a rdfs:Datatype. xsd:boolean a rdfs:Datatype. xsd:decimal a rdfs:Datatype. xsd:float a rdfs:Datatype. xsd:double a rdfs:Datatype. rdf:type rdf:type rdf:Property https://imma.ie/artists/ [ a fx:root , xhtml:div ; xhtml:id “az-group” ; rdf:_1 [ a xhtml:div ; rdf:_1 [ a xhtml:h4 ; rdf:_1 "A" ; <https://html.spec.whatwg.org/#innerHTML> "A" ; <https://html.spec.whatwg.org/#innerText> "A" ] ; … html.selector=#az-group @prefix xhtml: <http://www.w3.org/1999/xhtml#> . CSV JSON HTML XML Binary (JPEG, PNG, …) Text
  • 11. Binary and Text Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:root a rdfs:Class . xsd:int a rdfs:Datatype. xsd:string a rdfs:Datatype. xsd:boolean a rdfs:Datatype. xsd:decimal a rdfs:Datatype. xsd:float a rdfs:Datatype. xsd:double a rdfs:Datatype. xsd:base64Binary a rdfs:Datatype. rdf:type rdf:type rdf:Property [ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> “/9j/ 4AAQSkZJRgABAQEASABIAAD/ 4QmsRXhpZgAASUkqAAgAAAALAA8BAgAGAAAAkgAAABABAgAOAAAAmAAAABIBAw ABAAAAAQAAABoBBQABAAAApgAAABsBBQABAAAArgAAACgBAwABAAAAAgAAADEB AgALAAAAtgAAADIBAgAUAAAAwgAAABMCAwABAAAAAgAAAGmHBAABAAAA1gAAAC WIBAABAAAA0gMAAOQDAABDYW5vbgBDYW5vbiBFT1MgNDBEAEgAAAABAAAASAAA AAEAAABHSU1QIDIuNC41AAAyMDA4OjA3OjMxIDEwOjM4OjExAB4Am…”^^<http ://www.w3.org/2001/XMLSchema#base64Binary> ] . bin.encoding # BASE64 txt.regex # tokenise into a sequence CSV JSON HTML XML Binary (JPEG, PNG, …) Text https://imma.ie/collection/freeing-the-voice/ Hello World! [ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "Hello World!" ] .
  • 12. Facade X A simplified RDF meta-model, resembling a list-of-lists Components: Containers (typed), slots (int / string), values Intuitive, abstract notions: key-value, sequence, type “String”, 1, true,… “String”, 1, true xyz:row1, … xyz:row_n, xyz:… fx:root | xyz:* rdf:type xyz:* rdf:_N PREFIX fx: <http://sparql.xyz/facade-x/ns/> PREFIX xyz: <http://sparql.xyz/facade-x/data/> CSV XML JSON FX FX SPARQL
  • 13.
  • 14. Daga, Enrico; Asprino, Luigi; Mulholland, Paul and Gangemi, Aldo (2021). Facade-X: An Opinionated Approach to SPARQL Anything. In: Alam, Mehwish; Groth, Paul; de Boer, Victor; Pellegrini, Tassilo and Pandit, Harshvardhan J. eds. Volume 53: Further with Knowledge Graphs, Volume 53. IOS Press, pp. 58–73.
  • 15. https://sparql-anything.cc/ (3) Project on target ontology (1) Select input data (2) Build entities
  • 16. https:// github.com/ SPARQL- Anything/ showcase-tate Tate Gallery Open Data * CSV listing artworks * JSON with details Task: build a SKOS taxonomy of artwork subjects @enridaga
  • 17. Polifonia Ontology Network * Scenarios collected on GitHub as Markdown files Task: extract a list of competency questions from any scenario
  • 18. https://github.com/SPARQL-Anything/showcase-mei XML->CSV: using SPARQL to extract the note sequence from a XML/MEI file
  • 20. Features v0.6.0 https://sparql-anything.readthedocs.io/en/latest/ • XML, JSON, CSV, HTML, Excel, Text, Binary, EXIF, File System, Zip/Tar, Markdown, YAML, Bibtex, DOCx • Query templates / parameter queries (BASIL variables) • Full fledged HTTP client to query Web APIs (headers, auth, …) • Helper functions for sequences: fx:anySlot, fx:before, fx:after, … • Combine and nest SERVICE clauses (thanks to SPARQL) • Use SPARQL Results Sets as input for parametric queries • Large files (CSV): iterator-like execution style • Function: fx:entity • On-disk option (TDB2) • Command line interface [or] Server (based on Apache Jena Fuseki) • 100% open source, Apache Licence 2.0, relies on Apache Jena ARQ https://sparql-anything.cc/
  • 21. Benefits Compared to RML-based solutions and SPARQL Generate • Plain SPARQL 1.1 (vision: minimise the need for other formalisms) • “Free lunch” data exploration and querying • FX can express ANY format representable as BNF (article under review) • Open-ended extendibility: no changes to user-facing code required • No need to commit to a specific ontology, just query the data! • Low cognitive complexity — lower number of distinct tokens or variables (Halford et al. 2004; Warren et al. 2015). • Sustainable: lines of Java code to maintain: SPARQL Generate 12280 (core module); RML Mapper 7951; SPARQL Anything: 3842 (all transformers) — v0.2.0 (v0.6.0 has 11115)
  • 22. Work in progress • No support for RDB or other databases yet (e.g. MongoDB) • Facade-X is sufficient to represent relational data (article under review) • User study on the cognitive process behind KGC, comparing RML mappings vs Facade-X (Paul Warren) • Performance scales linearly to the size of the input data, however, this may hit memory limits: • Triple-filtering: reduces the amount of triples to be loaded in memory to only the ones used in the query • Can use ondisk triple store (TDB2) • Slicing: apply the query to an iteration of input data (currently only CSV) • Inspired by iterator based approaches (RML / S. Generate)
  • 23. Research directions • Properties of the Facade-X model • Facade-X: one of (many possible) RDF profiles? • More on execution strategies (e.g. query-rewriting to low-level structures) • FX for Linked Data Wrappers / Virtual Knowledge Graphs on heterogeneous sources (querying queries …) • FX mappings reuse = falls back to SPARQL modularisation (open problem, some work on federated querying / SPARQL + map reduce) • Relational database (use Ontop with a configuration on-demand?) • Support developers: how to help with new formats? • User interfaces for FX query design
  • 24. Get in touch! SPARQL Anything is under active development! https://sparql-anything.cc GitHub: https://github.com/SPARQL-Anything/sparql.anything enrico.daga@open.ac.uk @enridaga www.enridaga.net
  • 25. • Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th International Conference on Semantic Systems (2021) • Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June 2021) • Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan Thompson: Graph? Yes! Which one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021) • Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021) • Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018) • Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020) • Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019) • Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014) • García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for firsttime users. PeerJ Computer Science 6, e318 (2020) • Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011) • Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35– 50. Springer (2017) • Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006) • Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015. References
  • 26. Low (cognitive) complexity • One measure of complexity is the number of (distinct) items or variables (Halford et al. 2004; Warren et al. 2015).  • 8 CQ (vs SPARQL Generate) • What are the titles of the artworks attributed to “ANONIMO”? • What are the titles of the artworks created in the 1935? • … • 4 transformations (vs RML and SPARQL Generate) • Avg distinct tokens: • SPARQL Anything: ~18 • SPARQL Generate: ~25 (∼39.72% more) • RML: ~45 (∼150% more)
  • 27. Practicable and sustainable • Quantitative analysis of performance, to assess practicability • In-Memory implementation (Naive) • Execution time of q1-q12 • AVG on 10 executions • Comparable to RML Mapper and SPARQL Generate on files up to 1M JSON objects (~5M triples) • In-Memory implementation scales linearly • The approach is practicable • Research on performance as future work • Lines of Java code to maintain: SPARQL Generate 12280 (core module); RML Mapper 7951; SPARQL Anything: 3842 (all transformers) — v0.2.0 (v0.3.0 has ~11k)