Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Decentralized Evolution and Consolidation of RDF Graphs

145 views

Published on

Presentation of the Paper „Decentralized Evolution and Consolidation of RDF Graphs“ by Natanael Arndt and Michael Martin at #ICWE2017 in Rome, Italy

Published in: Science
  • Be the first to comment

  • Be the first to like this

Decentralized Evolution and Consolidation of RDF Graphs

  1. 1. Decentralized Evolution and Consolidation of RDF Graphs Natanael Arndt and Michael Martin June 7, 2017 ICWE 2017, Rome LEDS INKED NTERPRISE ATA ERVICESL E D S
  2. 2. Introduction
  3. 3. Introduction 1 / 45
  4. 4. Introduction 2 / 45
  5. 5. Introduction: Pfarrerbuch content editor (Project Team) [protected zone] SPARQL Endpoint HTML GUI [stable] OntoWiki Persistency Layer Backup Model query, search add, edit, maintain 3 / 45
  6. 6. Introduction: Catalogus Professorum Lipsiensium synchronize Model Data (SPARUL) Linked Data Linked Data Partial RDF export Full RDF export Backup Model experienced web user content editor (Project Team) general web user SPARQL Endpoint HTML GUI [stable] OntoWiki Persistency Layer SPARQL Endpoint HTML GUI [experimental] OntoWiki Persistency Layer HTML GUI [stable] CPL Frontend Persistency Layer OCPY TOWEL configure configure query, search add, edit, maintain getData query, search browse, annotate, discuss synchronize Model Data synchronize Model Data browse, search [protected zone] [public zone] 4 / 45
  7. 7. Introduction • Central SPARQL endpoints • Single Point of Failure, Unavailability • One consolidated status of the data • Only trusted access allowed • Asynchronous collaboration leads to inconsistency 5 / 45
  8. 8. Introduction: From Software Engineer To Data Engineering • In Software Engineering the term Software Crisis was coined • Systems and problems became more an more complex • Software Engineering Methods made the process of creating software more controllable • Configuration Management brought Source Code Management • CVS and SVN central systems • Darcs, Mercurial, Git decentralized • Git widely used (even Microsoft switched the Windows development to Git) 6 / 45
  9. 9. Introduction: From Software Engineer To Data Engineering • Subject of collaboration are RDF Graphs rather then Source Code Files • Consistency checks in DSCM ecosystem are made using Continous integration 7 / 45
  10. 10. Introduction: From Software Engineer To Data Engineering 8 / 45
  11. 11. Related Work/State of the Art
  12. 12. Related Work/State of the Art Approach storage quad support bnodes branches merge push/pull TailR [4] hybrid noa yes nof no (yes)h Eccrev [2] delta yes yes nof no no R43ples [3] delta nob,c (yes)d yes no no R&W base [5] delta noc (yes)e yes (yes)g no dat chunks n/a n/a no no yes a The granularity of versioning are repositories; b Only single graphs are put under version control; c The context is used to encode revisions; d Blank nodes are skolemized; e Blank nodes are addressed by internal identifiers; f Only linear change tracking is supported; g Naive merge implementation; h No pull requests but history replication via memento API 9 / 45
  13. 13. Preliminaries
  14. 14. Preliminaries • Atomic Graph • Atomic Partition • Difference • Change • Application of a Change 10 / 45
  15. 15. Preliminaries: Atomic Graph A C B D E 11 / 45
  16. 16. Preliminaries: Atomic Graph A C B D E 12 / 45
  17. 17. Preliminaries: Atomic Graph A C B D E 12 / 45
  18. 18. Preliminaries: Atomic Graph A C B D E 12 / 45
  19. 19. Preliminaries: Atomic Graph A C B D E 12 / 45
  20. 20. Preliminaries: Atomic Graph A C B D E 12 / 45
  21. 21. Preliminaries: Atomic Partition A C B D E A C D D E A C B D 13 / 45
  22. 22. Preliminaries: Atomic Partition A C B D E A C D D E A C B D 14 / 45
  23. 23. Preliminaries: Difference ∆(G, G′ ) := (C+ , C− ) Δ A C B D E A C B D E 15 / 45
  24. 24. Preliminaries: Difference C− := ˙∪ ( ˘P ( P(G) P(G′ ) )) A C B D E A C B D E ⋃ 16 / 45
  25. 25. Preliminaries: Difference C− := ˙∪ ( ˘P ( P(G) P(G′ ) )) A C D D E A C B D A C B E D E C D A ⋃ A C B D 17 / 45
  26. 26. Preliminaries: Difference C+ := ˙∪ ( ˘P ( P(G′ ) P(G) )) A C D D E A C B D A C B E D E C D A ⋃ A C B E 18 / 45
  27. 27. Preliminaries: Difference resp. Change C+ := ˙∪ ( ˘P ( P(G′ ) P(G) )) C− := ˙∪ ( ˘P ( P(G) P(G′ ) )) ∆(G, G′ ) := (C+ , C− ) A C B D A C B E Δ A C B D E A C B D E 19 / 45
  28. 28. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B D A C B E A C B D E Apl 20 / 45
  29. 29. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B E D E C D A A C D D E A C B D A C B D ∪ A C B E 21 / 45
  30. 30. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B E D E C D A A C D D E A C B D A C B D ∪ A C B E A C B D E 22 / 45
  31. 31. Preliminaries: Application of a Change Apl(G, (C+ G , C− G )) := ˙∪ ( ˘P ( (P(G) P(C− G )) ∪ P(C+ G ) )) A C B D A C B E A C B D E Apl A C B D E 23 / 45
  32. 32. Operations
  33. 33. Operations • Commit • Distributed Evolution • Merge of Two Evolved Graphs • Revert a Commit 24 / 45
  34. 34. Operations: Commit A A({G0 }) 25 / 45
  35. 35. Operations: Commit A A({G0 }) Apl(G0 , (C+ G0 , C− G0 )) = G 25 / 45
  36. 36. Operations: Commit A B A({G0 }) Apl(G0 , (C+ G0 , C− G0 )) = G B{A}({G}) 25 / 45
  37. 37. Operations: Commit A B C A({G0 }) Apl(G0 , (C+ G0 , C− G0 )) = G B{A}({G}) C{B{A}}({G′ }) 25 / 45
  38. 38. Operations: Distributed Evolution A B C D Figure 1: Two branches evolved from a common commit D{B{A}}({G′′ }) 26 / 45
  39. 39. Operations: Merge of Two Evolved Graphs Merge(C({G′ }), D({G′′ })) = E{C,D}({G′′′ }) A B C D E Figure 2: Merging commits from two branches into a common version of the graph 27 / 45
  40. 40. Operations: Revert a Commit ∆−1 (G0 , G) = ∆(G, G0 ) A B B−1 Figure 3: A commit reverting the previous commit 28 / 45
  41. 41. Merge Strategies
  42. 42. Merge Strategies • Union Merge • All Ours/All Theirs • Three-Way Merge 29 / 45
  43. 43. Merge Strategies: Union Merge A C B D E A C D D E A C B D A C B E D E C D A A C B D E ∪⋃ A C B D E = 30 / 45
  44. 44. Merge Strategies: All Ours/All Theirs A C B D E A C B D E A C B D E X 31 / 45
  45. 45. Merge Strategies: Three-Way Merge A C B D E A C B D E A C B D E 32 / 45
  46. 46. Merge Strategies: Three-Way Merge A C B D E A C B D E A C B D E E C⁺= C⁺= C⁻= A C B A C B D 33 / 45
  47. 47. Merge Strategies: Three-Way Merge A C B D E A C B D E A C B D E A C B D E E C⁺= C⁺= C⁻= A C B A C B D 34 / 45
  48. 48. Evaluation
  49. 49. Evaluation • We have a prototypical implementation of our concepts: QuitStore1 • We are using the Berlin SPARQL benchmark (BSBM) for evaluating our system • We are using the Explore and Update Use Case since it provides SPARQL query and update operations • All scripts used are available at https://github.com/AKSW/QuitEval 1 https://github.com/AKSW/QuitStore 35 / 45
  50. 50. Evaluation: Correctness of Version Tracking • We take git repository, the initial data set and the query execution log (run.log) produced by BSBM • We load the data into a store and execute all queries stored in the run.log • We could verify, that the state of the store was always similar to the content of the respective git commit 36 / 45
  51. 51. Evaluation: Correctness of Merge Method • We take the graph generated by BSBM • We generate branches with two randomly different sets of added and deleted statements • We generate a graph containing the expected result of the merge operation • We execute the merge operation and compare the resulting graph to the expected result • This process was repeated 1000 times 37 / 45
  52. 52. Evaluation: Performance 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 200 400 600 800 1000 1200 MiB MiB(memory) #commits Quit repo size Quit with gc repo size Quit memory Quit with gc memory 38 / 45
  53. 53. Evaluation: Performance 0.01 0.1 1 10 100 1000 IN SERT D ATA D ELETE W H EREExplore 1Explore 2Explore 3Explore 4Explore 5Explore 7Explore 8Explore 9 Explore 10 Explore 11 Explore 12 queriespersecond(qps) quit versioning quit versioning with gc no versioning (baseline) 39 / 45
  54. 54. Conclusion
  55. 55. Conclusion • Presented a formal framework for the distributed evolution of RDF knowledge bases • Atomic operations on RDF graphs • Formalized definitions of the versioning operations: commit, branch, merge and revert • Quad aware, handle blank nodes, supports branches, supports merging with conflict resolution, allows distributed collaboration with push and pull • Merge strategies where transfered to the application on atomic graphs to be used on RDF datasets 40 / 45
  56. 56. Future Work
  57. 57. Future Work • Improve our Quit Store implementation to support the complete framework • Explore provenance tracked by Git through an RDF interface ✓ [1] • Implement the Quit architecture for real world problems 41 / 45
  58. 58. Future Work experienced web user general web user content editor (Project Team) [protected zone] SPARQL Endpoint HTML GUI [stable] OntoWiki Persistency Layer query, search add, edit, maintain clone/fetch/push public + private Data Data Transformation Tasks (ETL) add new Data Legacy Data Sources [public zone] any RDF Editor Commenting Interface Browsing Interfacequery, search comment 42 / 45
  59. 59. References I N. Arndt, P. Naumann, and E. Marx. Exploring the evolution and provenance of git versioned rdf data. In J. D. Fernández, J. Debattista, and J. Umbrich, editors, 3rd Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW) co-located with 14th European Semantic Web Conference (ESWC 2017), Portoroz, Slovenia, May 2017. M. Frommhold, R. N. Piris, N. Arndt, S. Tramp, N. Petersen, and M. Martin. Towards Versioning of Arbitrary RDF Data. In 12th International Conference on Semantic Systems Proceedings (SEMANTiCS 2016), SEMANTiCS ’16, Leipzig, Germany, Sept. 2016. 43 / 45
  60. 60. References II M. Graube, S. Hensel, and L. Urbas. Open semantic revision control with r43ples: Extending sparql to access revisions of named graphs. In Proceedings of the 12th International Conference on Semantic Systems, SEMANTiCS 2016, pages 49–56, New York, NY, USA, 2016. ACM. P. Meinhardt, M. Knuth, and H. Sack. Tailr: A platform for preserving history on the web of data. In Proceedings of the 11th International Conference on Semantic Systems, SEMANTICS ’15, pages 57–64, New York, NY, USA, 2015. ACM. 44 / 45
  61. 61. References III M. V. Sande, P. Colpaert, R. Verborgh, S. Coppens, E. Mannens, and R. V. de Walle. R&wbase: git for triples. In C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas, and S. Auer, editors, LDOW, volume 996 of CEUR Workshop Proceedings. CEUR-WS.org, 2013. 45 / 45

×