With the increasing adoption of NoSQL data base systems like MongoDB or CouchDB more and more applications store structured data according to a non-relational, document oriented model. Exposing this structured data as Linked Data is currently inhibited by a lack of standards as well as tools and requires the implementation of custom solutions. While recent efforts aim at expressing transformations of such data models into RDF in a standardized manner, there is a lack of approaches which facilitate SPARQL execution over mapped non-relational data sources. With SparqlMap-M we show how dynamic SPARQL access to non-relational data can be achieved. SparqlMap-M is an extension to our SPARQL-to-SQL rewriter SparqlMap that performs a (partial) transformation of SPARQL queries by using a relational abstraction over a document store. Further, duplicate data in the document store is used to reduce the number of joins and custom optimiza-tions are introduced. Our showcase scenario employs the Berlin SPARQL Benchmark (BSBM) with different adap-tions to a document data model. We use this scenario to demonstrate the viability of our approach and compare it to different MongoDB setups and native SQL.
Jörg Unbehauen | AKSW, Universität Leipzig
Presentation at Semantics 2016 in Leipzig in the context with the results of the LEDS project
Executing SPARQL Queries over Mapped Document Stores with SparqlMap-M
1. Executing SPARQL queries over Mapped Document
Stores with SparqlMap-M
J. Unbehauen M. Martin
IIS // AKSW // BIS // IfI
Leipzig University
SEMANTiCS 2016
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 1 / 25
2. Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 2 / 25
3. Scoping
[1] S. Auer, J. Lehmann, A. Ngonga Ngomo. Introduction to Linked Data and Its
Lifecycle on the Web, Reasoning Web. Semantic Technologies for the Web of
Data, LNCS 6848, 2011
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 3 / 25
4. Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
5. Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software Development
Poster
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
6. Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software Development
Poster
Use cases in both research and industry
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
7. Motivation
NoSQL DBMS and document stores are thriving
Document stores used in Rapid Application Development Frameworks
Visit our Adding Semantics to Model-Driven Software Development
Poster
Use cases in both research and industry
Current solutions support R2RML and relational databases
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 4 / 25
8. Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 5 / 25
9. SparqlMap Architecture
Binding
Translat.
SparqlMap
Query
Analysis
Query
Parsing
Mapping
Binding
Query
SELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3) }
Result
?name
------------
’Mary R.’
’James T.’
Translat.
Exec.
[2] J. Unbehauen, C. Stadler, and S. Auer. Accessing relational data on the web
with sparqlmap. In JIST. 2012.
[3] J. Unbehauen, C. Stadler, and S. Auer. Optimizing sparql-to-sql rewriting. In
IIWAS, 2013.
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 6 / 25
15. Data Models and Mapping
Binding
Translat.
SparqlMap-M
SparqlMap
Query
Analysis
Query
Parsing
Mapping
Binding
Selective
Materialization
Query
SELECT DISTINCT ?name {
?person foaf:name ?name. #(tp1)
?person :inDepartment ?dep. #(tp2)
?dep rdfs:label ’Research’ #(tp3)
}
Mapping
Deduplication
Union De-
composition
Materialized
Execution?name
------------
’Mary R.’
’James T.’
Translat.
Exec.
Result
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 8 / 25
16. Data Models and Mapping
Key-Value pairs
Nested documents
Schema less
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 9 / 25
17. Data Models and Mapping
A relational view on documents by:
Goal: reuse existing (R2RML) concepts
Unnesting documents by joining them with parent → Flat structure
Naming attributes to reflect hierarchy → Key-Value treated as tuples
Schema imposed by mapping
#Department
{ i d : 2 , name : ” Research ” ,
emp : [{ i d : 1 , name : ”Mary R.”} ,
{ i d : 2 , name : ”James T. ” } ] } ,
i d | name | emp . i d | emp . name
−−+−−−−−−−−−+−−−−−−−+−−−−−−−−
2 | Research |1 | Mary R.
2 | Research |2 | James T.
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 10 / 25
19. Query Structure
SparqlMap
Recursive translation yields nested unions
Index hits require careful query design
Complex expressions for joins
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 12 / 25
20. Query Structure
SparqlMap
Recursive translation yields nested unions
Index hits require careful query design
Complex expressions for joins
SparqlMap-M / MongoDB
No direct equivalents for joins
No complex equivalence expression
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 12 / 25
21. Query Structure: Union Decomposition
Nested Unions:
?dep=?dep
σname=Research
trm3
?person=?person
trm1 trm4 trm2 trm5
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 13 / 25
22. Query Structure: Union Decomposition
Nested Unions:
?dep=?dep
σname=Research
trm3
?person=?person
trm1 trm4 trm2 trm5
Pushed Union:
?dep=?dep
trm3 ?person=?person
trm1 trm2
?dep=?dep
trm3 ?person=?person
trm4 trm5
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 13 / 25
26. De-Duplication
Documents are nested for fast
retrieval and filtering
Naive mapping introduces overhead
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 17 / 25
27. De-Duplication
Documents are nested for fast
retrieval and filtering
Naive mapping introduces overhead
Declaratively label
R2RML-TriplesMaps as duplicated
Only use denormalized data in joins
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 17 / 25
28. Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 18 / 25
29. Benchmark Setup
BSBM for availability of both SQL and RDF representation
SQL representation translated into MongoDB documents
Additionally performed denormalization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 19 / 25
30. Benchmark Results
BSBM 10 million triples
PostgreSQL Fastest
MongoDB-Naive/-Dup Dup required for performance
SparqlMap-M-Naive/ -Dup/ -DupAware
Overhead by rewriting/materialization
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 20 / 25
31. Benchmark Results
BSBM Q4
Medium selectivity
Naive modes touch a lot of data
Performance gain by duplicate
data (MongoDB, SparqlMap-M)
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 21 / 25
32. Benchmark Results
BSBM Q5
Low selectivity join
SparqlMap-M: expensive self
join in memory, dominates cost
MongoDB: Self-join in
aggregate pipeline, slower than
PostgreSQL
BSBM Q9
High selectivity join
SparqlMap-M-Dup(Aware):
duplicates increase overhead.
Unpushable join dominates cost
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 22 / 25
33. Outline
1 Motivation and Scope
2 Approach
3 Evaluation
4 Conclusions and Future Work
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 23 / 25
34. Future Work
Enable Updates
Integrate Caching
Evaluate Join capable query language
MongoDB left outer join ($lookup)
Multimodel databases: ArangoDB, OrientDB
DB virtualizations: JBoss Teiid, Apache HAWQ
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 24 / 25
35. Conclusion
Architecture for a SPARQL execution layer over document stores
Harness duplicates for increasing performance
Evaluated with BSBM on MongoDB
J. Unbehauen, M. Martin (Leipzig Univ.) SPARQL over Document Stores: SparqlMap-M SEMANTiCS 2016 25 / 25