Successfully reported this slideshow.

Parallel and incremental materialisation of RDF/DATALOG in RDFOX

2

Share

Loading in …3
×
1 of 45
1 of 45

Parallel and incremental materialisation of RDF/DATALOG in RDFOX

2

Share

Download to read offline

Parallel and incremental materialisation of RDF/DATALOG in RDFOX by Boris Motik (University of Oxford)

Parallel and incremental materialisation of RDF/DATALOG in RDFOX by Boris Motik (University of Oxford)

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Parallel and incremental materialisation of RDF/DATALOG in RDFOX

  1. 1. PARALLEL AND INCREMENTAL MATERIALISATION OF RDF/DATALOG IN RDFOX Boris Motik University of Oxford March 20, 2015
  2. 2. TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
  3. 3. Introduction TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 0/13
  4. 4. Introduction RDFOX SUMMARY RDFox: a new RDF store and reasoner http://www.cs.ox.ac.uk/isg/tools/RDFox/ Features: RAM-based storage of RDF data Currently centralised, but a distributed system is in the works Datalog reasoning via materialisation Can handle arbitrary (recursive) datalog rules, not just OWL 2 RL Very effective parallelisation Efficient reasoning with owl:sameAs via rewriting Known and widely-used technique, but correctness not trivial The Backward-Forward (B/F) incremental maintenance algorithm Considerably improves on DRed Compatible with rewriting of owl:sameAs SPARQL query answering Most of SPARQL 1.0 and some of SPARQL 1.1 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
  5. 5. Parallel Materialisation of Datalog TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 1/13
  6. 6. Parallel Materialisation of Datalog MAIN CHALLENGES 1 Assign workload to threads evenly Rules are generally not independent due to recursion Static assignment of rule instances can be affected due to data skew ⇒ Dynamic assignment with low overhead needed 2 Efficiently interleave . . . . . . querying (during evaluation of rule bodies) . . . updates (during updates of derived facts) 3 Provide indexes for efficient rule body evaluation Crucial for elimination of duplicate triples ⇒ ensures termination Usually sorted (and clustered) to allow for merge joins Hash indexes can also be used Individual (i.e., not bulk) index updates are inefficient B. Motik, Y. Nenov, R. Piro, I. Horrocks, and D. Olteanu. Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF Systems. AAAI 2014, pages 129–137 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 2/13
  7. 7. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  8. 8. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM ⇒ R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  9. 9. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) ⇒ R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  10. 10. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) ⇒ R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(b) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  11. 11. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) ⇒ R(b,e) A(a) R(c,f) R(c,g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(b) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  12. 12. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) ⇒ A(a) R(c,f) R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(a,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  13. 13. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) ⇒ R(c,f) R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(c) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  14. 14. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) ⇒ R(c,g) A(b) A(c) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: A(c) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  15. 15. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) ⇒ A(b) A(c) A(d) A(e) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(b,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  16. 16. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) ⇒ A(c) A(d) A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(c,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  17. 17. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) ⇒ A(d) A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(d,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  18. 18. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) ⇒ A(e) A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(e,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  19. 19. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) A(e) ⇒ A(f) A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(f,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  20. 20. Parallel Materialisation of Datalog SOLUTION PART I: ALGORITHM R(a,b) R(a,c) R(b,d) R(b,e) A(a) R(c,f) R(c,g) A(b) A(c) A(d) A(e) A(f) ⇒ A(g) A(x) ∧ R(x, y) → A(y) For each fact: match the fact to all body atoms to obtain subqueries evaluate subqueries w.r.t. all previous facts add results to the table Current subquery: R(g,y) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 3/13
  21. 21. Parallel Materialisation of Datalog PARALLELISING COMPUTATION Each thread extracts facts and evaluates subqueries independently The number of subqueries is determined by the number of facts ensures in practice that threads are equally loaded Requires no thread synchronisation ⇒ We partition rule instances dynamically and with little overhead Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 4/13
  22. 22. Parallel Materialisation of Datalog SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY The critically algorithm depends on: matching atoms t1, t2, t3 with ti a constant or variable continuous concurrent updates Our RDF storage data structure: Hash-based indexes ⇒ naturally parallel data structure ‘Mostly’ lock-free: at least one thread makes progress at most of the time compare: if a thread acquire a lock and dies, other threads are blocked main benefit: performance is less susceptible to scheduling decisions Main technical challenge: reduce thread interference When A writes to a location cached by B, the cache of B is invalidated Our updates ensure that threads (typically) write to different locations Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 5/13
  23. 23. Parallel Materialisation of Datalog EVALUATION I: PARALLELISATION SPEEDUP RDFox: an RDF store developed at Oxford University http://www.cs.ox.ac.uk/isg/tools/RDFox/ 8 16 24 32 2 4 6 8 10 12 14 16 18 20 ClarosL ClarosLE DBpediaL DBpediaLE LUBML 01K LUBMU 01K 8 16 24 32 2 4 6 8 10 12 14 16 18 20 UOBML 01K UOBMU 010 LUBMLE 01K LUBML 05K LUBMLE 05K LUBMU 05K Small concurrency overhead; parallelisation pays off already with two threads Speedup continues to increase after we exhaust all physical cores ⇒ hyperthreading and parallelism can compensate CPU cache misses Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 6/13
  24. 24. Parallel Materialisation of Datalog EVALUATION II: ORACLE’S SPARC T5 Machine specification: 4 TB of RAM 128 physical cores that support 1024 virtual cores via hyperthreading Name Triples Time (s) Speedup Inference Initial Resulting Threads rate 1 1024 (triples/s) ClarosLE 18.8 M 533.7 M 7484 74 101 7.2 M LUBML-1K 133.6 M 182.4 M 511 10 51 4.9 M LUBMLE -1K 133.6 M 332.6 M 5267 37 142 5.4 M LUBML-140k: 8 G triples, materialised to 10.9 G triples 20 threads: 2000 s, inference rate 1.45 M triples/s 128 threads: 599 s, inference rate 4.84 M triples/s Materialised dataset used about half of RAM (2 TB) Thanks to Jay Banerjee, Brian Whitney, Hassan Chafi, and Zhe Wu @ Oracle Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
  25. 25. Handling owl:sameAs via Rewriting TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 7/13
  26. 26. Handling owl:sameAs via Rewriting HANDLING owl:sameAs VIA REWRITING Rewriting: replace all equal constants with one representative Well known approach; used in graphDB, Oracle, WebPIE, . . . Much more efficient than direct materialisation Open question I: Effective parallelisation Lock-free maintenance of representatives Care needed to ensure correctness and nonrepetition of derivations Open question II: Query evaluation Could expand rewritten data before query evaluation, but that is inefficient Better: evaluate queries on rewritten data and expand the answer Such a straightforward approach is incorrect: Result cardinalities might be wrong The presence of FILTER and BIND can make results incorrect B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Handling owl:sameAs via Rewriting. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 8/13
  27. 27. Handling owl:sameAs via Rewriting EVALUATION OF REWRITING UOBM 279 4 2.2M AX 36M 1.2 332M 16,152M REW 9.4M 9.7M 0.4 33.8M 4,256M 686 factor 3.2x 3.2x 9.9x 3.8x Table 3: Materialisation Times with Axiomatisation and Rewriting Test Claros DBpedia OpenCyc Threads AX REW AX REW AX REW AX REW AX REW AX REW sec spd sec spd sec spd sec spd sec spd sec spd 1 2042.9 1.0 65.8 1.0 31.1 219.8 1.0 31.7 1.0 6.9 2093.7 1.0 119.9 1.0 17.5 2 969.7 2.1 35.2 1.9 27.6 114.6 1.9 17.6 1.8 6.5 1326.5 1.6 78.3 1.5 16.9 4 462.0 4.4 18.1 3.6 25.5 66.3 3.3 10.7 3.0 6.2 692.6 3.0 40.5 3.0 17.1 8 237.2 8.6 9.9 6.7 24.1 36.1 6.1 5.2 6.0 6.9 351.3 6.0 23.0 5.2 15.2 12 184.9 11.1 7.9 8.3 23.3 31.9 6.9 4.1 7.7 7.7 291.8 7.2 56.2 2.1 5.5 16 153.4 13.3 6.9 9.6 22.3 27.5 8.0 3.6 8.8 7.7 254.0 8.2 52.3 2.3 4.9 Test UniProt UOBM Threads AX REW AX REW AX REW AX REW sec spd sec spd sec spd sec spd 1 370.6 1.0 143.4 1.0 2.6 2696.7 1.0 1152.7 1.0 2.3 2 232.3 1.6 86.7 1.7 2.7 1524.6 1.8 599.6 1.9 2.5 4 129.2 2.9 46.5 3.1 2.8 813.3 3.3 318.3 3.6 2.6 8 74.7 5.0 25.1 5.7 3.0 439.9 6.1 177.7 6.5 2.5 12 61.0 6.1 19.9 7.2 3.1 348.9 7.7 152.7 7.6 2.3 16 61.9 6.0 17.1 8.4 3.6 314.4 8.6 137.9 8.4 2.3 mode takes less than ten seconds, these results are difficult o measure and are susceptible to skew. Our results confirm that rewriting can significantly reduce materialisation times. RDFox was consistently faster in the REW mode than in the AX mode even on UniProt, where the eduction in the number of triples is negligible. This is due to he reduction in the number of derivations, mainly involving ules (⇡1)–(⇡5), which is still significant on UniProt. In all cases, the speedup of rewriting is typically much larger than connected by the :hasSameHomeTownWith property. Thi property is also symmetric and transitive so, for each pai of connected resources, the number of times each triple i derived by the transitivity rule is quadratic in the number o connected resources. This leads to a large number of dupli cate derivations that do not involve equality. Thus, althoug it is helpful, rewriting does not reduce the number of deriva tion in the same way as, for example, on Claros, which ex plains the relatively modest speedup of REW over AX. Speedup is bigger than the reduction in the number of triples The number of derivations is the determining factor Contrary to popular belief! Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
  28. 28. Incremental Materialisation Maintenance TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 9/13
  29. 29. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  30. 30. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  31. 31. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  32. 32. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: 1 Delete all facts with a derivation from A(a) C0(x)D ← A(x)D C0(x)D ← B(x)D Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n C0(x)D ← Cn(x)D Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  33. 33. Incremental Materialisation Maintenance THE DRED ALGORITHM AT A GLANCE Delete/Rederive (DRed): state of the art incremental maintenance algorithm EXAMPLE C0(x) ← A(x) C0(x) ← B(x) Ci (x) ← Ci−1(x) for 1 ≤ i ≤ n C0(x) ← Cn(x) A(a) B(a) C0(a) C1(a) . . . Cn(a) Materialise initial facts Delete A(a) using DRed: 1 Delete all facts with a derivation from A(a) C0(x)D ← A(x)D C0(x)D ← B(x)D Ci (x)D ← Ci−1(x)D for 1 ≤ i ≤ n C0(x)D ← Cn(x)D 2 Rederive facts that have an alternative derivation C0(x) ← C0(x)D ∧ A(x) C0(x) ← C0(x)D ∧ B(x) Ci (x) ← Ci (x)D ∧ Ci−1(x) for 1 ≤ i ≤ n C0(x) ← C0(x)D ∧ Cn(x) Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 10/13
  34. 34. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) B(a) C0(a) C1(a) . . . Cn(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  35. 35. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  36. 36. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) ? B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  37. 37. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  38. 38. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  39. 39. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) ? C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  40. 40. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) ? C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  41. 41. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable 6 So C0(a) is derivable too B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  42. 42. Incremental Materialisation Maintenance IMPROVEMENT: THE B/F ALGORITHM In RDF, a fact often has many alternative derivations ⇒ Many facts get deleted in the first step The Backward/Forward (B/F) algorithm: look for alternatives immediately A(a) × B(a) C0(a) C1(a) . . . Cn(a) Delete A(a) using B/F: 1 Is A(a) derivable in any other way? 2 No ⇒ delete 3 As in DRed, identify C0(a) as derivable from A(a) 4 Apply the rules to C0(a) ‘backwards’ ⇒ by C0(x) ← B(x), we get B(a) 5 B(a) is explicit so it is derivable 6 So C0(a) is derivable too 7 Stop propagation and terminate B. Motik, Y. Nenov, R. Piro, and I. Horrocks. Incremental Update of Datalog Materialisation: the Backward/Forward Algorithm. AAAI 2015 Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 11/13
  43. 43. Incremental Materialisation Maintenance EVALUATION OF THE B/F ALGORITHM Table 2: Experimental results Dataset |E | |I I0 | Rematerialise DRed B/F Time Derivations Time Derivations Time Derivations (s) Fwd (s) |D| DR2 DR4 DR5 (s) |C| Bwd Sat Del Prop LUBM-1k-L 100 113 139.4 212.5M 0.0 1.0k 1.1k 0.8k 1.0k 0.0 0.5k 0.2k 0.3k 0.2k |E| = 133.6M 5.0k 5.5k 101.8 212.5M 0.2 55.5k 67.2k 46.9k 59.8k 0.2 23.0k 9.3k 13.7k 7.4k |I| = 182.4M 2.5M 2.7M 138.5 208.8M 39.4 10.3M 15.2M 6.6M 11.5M 32.8 10.0M 4.1M 5.6M 3.7M Mt = 121.5s 5.0M 5.5M 91.8 205.0M 54.8 17.8M 26.3M 10.5M 18.9M 62.3 18.8M 7.8M 10.1M 7.5M Md = 212.5M 7.5M 8.3M 89.2 201.3M 71.5 24.3M 35.5M 13.6M 24.3M 85.4 26.7M 11.0M 14.0M 11.2M 10.0M 11.0M 99.5 197.5M 127.9 30.0M 43.1M 15.9M 28.1M 102.2 34.1M 14.0M 17.4M 15.0M UOBM-1k-Uo 100 160 3482.0 3.6G 8797.6 1.8G 2.6G 53.2M 2.6G 5.4 0.8k 0.5k 1.3k 0.5k |E| = 254.8M 5.0k 85.2k 3417.8 3.6G 9539.3 1.8G 2.6G 53.2M 2.6G 28.2 105.9k 17.9k 42.1k 104.1k |I| = 2.2G 17.0M 130.9M 3903.1 3.4G 8934.3 1.8G 2.7G 63.7M 2.5G 988.8 175.8M 47.6M 104.0M 196.7M Mt = 5034.0s 34.0M 269.0M 4084.1 3.2G 9492.5 1.9G 2.8G 68.4M 2.4G 1877.2 340.7M 87.5M 182.3M 401.1M Md = 3.6G 51.0M 422.8M 4010.0 3.0G 10659.3 1.9G 2.9G 71.5M 2.2G 2772.7 513.7M 125.2M 246.8M 622.0M 68.0M 581.4M 3981.9 2.8G 11351.6 1.9G 2.9G 73.3M 2.1G 3737.3 687.0M 162.5M 289.5M 848.6M Claros-L 100 212 62.9 128.6M 0.0 0.8k 1.0k 0.2k 0.5k 0.0 0.6k 0.3k 0.7k 0.5k |E| = 18.8M 5.0k 11.3k 62.8 128.6M 0.4 37.8k 50.7k 10.9k 23.9k 0.4 29.1k 18.8k 35.3k 26.8k |I| = 74.2M 0.6M 1.3M 62.3 125.6M 32.3 4.1M 5.5M 1.1M 2.5M 14.9 3.1M 2.0M 3.6M 3.0M Mt = 78.9s 1.2M 2.6M 61.2 122.6M 53.2 7.8M 10.8M 2.0M 4.8M 33.6 6.1M 3.8M 6.7M 6.0M Md = 128.6M 1.7M 4.0M 60.5 119.5M 73.6 11.4M 15.9M 2.8M 6.8M 47.8 8.9M 5.6M 9.5M 9.1M 2.3M 5.5M 60.0 116.3M 91.0 14.8M 20.9M 3.6M 8.6M 60.6 11.7M 7.3M 12.0M 12.3M Claros-LE 100 0.5k 3992.8 12.6G 0.0 1.3k 2.0k 0.3k 0.9k 0.0 1.0k 0.7k 1.0k 1.1k |E| = 18.8M 2.5k 178.9k 5235.1 12.6G 8077.4 5.5M 11.7G 176.6k 11.7G 10.3 216.4k 161.2k 8.8M 320.0k |I| = 533.7M 5.0k 427.5k 4985.1 12.6G 7628.2 6.0M 11.7G 186.0k 11.7G 16.5 485.6k 369.0k 8.9M 769.3k Mt = 4024.5s 7.5k 609.6k 4855.0 12.6G 7419.1 6.5M 11.7G 193.9k 11.7G 19.5 683.4k 516.8k 9.0M 1.1M Md = 12.9G 10.0k 780.8k 5621.3 12.6G 7557.9 6.8M 11.7G 207.6k 11.7G 3907.2 6.0M 723.0M 11.7G 16.9M Test Datasets Table 2 summarises the properties of the four datasets we used in our tests. LUBM (Guo, Pan, and Heflin 2005) is a well-known RDF benchmark. We extracted the datalog fragment of the LUBM a congruence relation: derivations with owl:sameAs tend to proliferate so an efficient incremental algorithm would have to treat this property directly; we leave developing such an extension to our future work. Please note that omitting the Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
  44. 44. Conclusion TABLE OF CONTENTS 1 INTRODUCTION 2 PARALLEL MATERIALISATION OF DATALOG 3 HANDLING owl:sameAs VIA REWRITING 4 INCREMENTAL MATERIALISATION MAINTENANCE 5 CONCLUSION Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 12/13
  45. 45. Conclusion RESEARCH DIRECTIONS Add a data/query/reasoning distribution layer: Initial results very promising Implementation in progress Future work: Investigate potential for data compression Improve join cardinality estimation Improve query planning Boris Motik Parallel and Incremental Materialisation of RDF/Datalog in RDFox 13/13

×