Student profile product demonstration on grades, ability, well-being and mind...
The SPARQL Anything project
1. The SPARQL Anything project
Enrico Daga and Luigi Asprino
The Web Conference - Developers Track
22/04/2021 - online @enridaga
2. Background
• Semantic Web developers always concerned with methods to
“lift” legacy content to RDF:
• Targeting specific types/formats: SPARQL Microservices
[Michel, 2019], Tarql, Any23, JSON2RDF, CSV2RDF
• Mapping languages, several types of (e.g. RML,
ShexML): high learning demands. [Dimou, 2014]
[García-González, 2020]
• SPARQL Generate: learning demands, difficult to extend
to other formats. [Lefrançois, 2017]
• Solutions transfer data source complexity to the user (e.g.
know XPath for XML, JsonPath for JSON, …)
• End-user development [Lieberman, 2006]. Many SPARQL
users fall into the category of end-user developer. In a recent
survey, 42% SPARQL users are from non-IT areas,
including social sciences and the humanities, business and
economics, and biomedical, engineering or physical sciences.
3. SPICE
Social Cohesion, Participation and Inclusion
through Cultural Engagement
Polifonia
Digital Harmoniser of Musical Cultural Heritage
-
Cultural Heritage Knowledge Graphs
-
Sources in different formats
x
Multiple / unknown ontologies
=
Duplication of effort!!!
https://spice.kmi.open.ac.uk/
http://spice-h2020.eu
https://polifonia-project.eu/
This project has received funding from the European
Union’s Horizon 2020 research and innovation
programme
4. Knowledge Graph Construction
Composite process:
• Observe: the data source (e.g. a CSV file)
• Map: develop mappings to a target ontology
• Triplify: run the mappings and evaluate the result
• (many iterations)
KG construction is a twofold job:
• perform a syntax/structure conversion (e.g. from CSV to RDF)
• project semantics onto the data (applying a domain ontology)
5. Concept
… twofold job:
• perform a syntax/structure conversion -> Re-engineering
• We want to solve this problem once and for all
• project semantics onto the data (applying a domain ontology) -> Re-modelling
• We leave this to the end user, powered by SPARQL 1.1
• Approach: design a single RDF facade for any data format
• Re-engineering
• Focus on the syntax and the meta-model (data structure)
• Leave data as much as possible as-it-is!
• apply the least possible “ontological commitment”
https://en.wikipedia.org/wiki/Facade_pattern
6. An RDF Facade?
Problem Space
• CSV
• JSON
• HTML
• XML
• Binary (JPEG, PNG, …)
• Text
Solution Space
• https://www.w3.org/TR/rdf11-concepts/
• https://www.w3.org/TR/rdf-schema/
rdf:type, rdf:Property, rdfs:label,
rdfs:Resource, rdfs:Class, rdf:Bag,
rdfs:Container, rdf:List, RDF Dataset,
Graph, …
Facade-X: (to be filled by picking and mixing from the solution space)
Ups! We are facing the same old problem … only this time we don’t care about the content
(domain) and we only focus on the format and data structure (meta-model)
14. Preliminary feedback
• From 27 users, diverse SPARQL expertise
• Essential or very important
• the system should minimise the languages or syntaxes needed
• mappings should be easy to read and interpret
• the system must be easy to learn for a Semantic Web practitioner
• the system is able to support new types of data sources without changes to the mapping language
• How easy is this code to understand
(comparing equivalent mappings)?
• (a) RML
• (b) SPARQL Generate
• (c) SPARQL Anything
15. Benefits
• Transform / Query resources having heterogeneous formats
• Low learning demands (plain SPARQL 1.1)
• Minimise complexity of the mappings
• A single+consistent abstraction for any data format
• Enable data exploration in the absence of a domain ontology
• Integrate with a typical Semantic Web engineering workflow
• Flexible and adaptable (Facade-X can be extended, if needed)
• Easy to extend:
• new transformers just need to return the facade
• no major changes to the user experience
16. Challenges
• No commitment on the internal machinery! (It is a gift and a curse …)
• Current version v0.1.1 (we started Nov 2020):
• implemented on top of Apache Jena ARQ
• limited to files
• loads the triples in-memory and then performs the query
• A triple filtering strategy reduces in-memory dataset
• Very large files require very large memory
• Next: to develop strategies to cope with large resources (e.g. slicing)
• Next: to develop query-rewriting strategies, eventually rewriting mappings into efficient,
iterator-based transformers (mapping translation [Corcho 2020])
• Next: Relational Database, No-SQL (e.g. mongoDB)
• Reuse existing approaches (e.g. OBDA) but hide complexity to the user
17. Get in touch!
SPARQL Anything is under active development
https://github.com/SPARQL-Anything/sparql.anything
enrico.daga@open.ac.uk
@enridaga
www.enridaga.net
18. References
• Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything (submitted). In: SEMANTiCS 2021:
17th International Conference on Semantic Systems (2021)
• Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021)
• Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer
(2018)
• Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020)
• Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data
standards. In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019)
• Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings
of heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014)
• García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data
mapping languages for firsttime users. PeerJ Computer Science 6, e318 (2020)
• Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state
of the art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011)
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web
Conference. pp. 35– 50. Springer (2017)
• Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer
(2006)
• Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.