Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

577 views

Published on

SEMANTICS 2017 Talk slides (Nominated for best paper award)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

  1. 1. Trying Not to Die Benchmarking using LITMUS Harsh Thakkar1 , Yashwant Keswani2 , Mohnish Dubey1 , Jens Lehmann1,3 , Sören Auer4 1 University of Bonn, Bonn, Germany 2 DA-IICT, Gandhinagar, India 3 Fraunhofer IAIS, St. Augustin, Germany 4 TIB, Hannover, Germany - Amsterdam - Nederland - September 13
  2. 2. 2Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Outline ● Motivation ● Problem Statement ● State of the Art ● Approach - LITMUS Benchmark Suite ● Challenges ● Evaluation Plan ● Next Steps
  3. 3. 3Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF-3X Ocean of Data Sea of Tools+ K-V stores Graph stores Doc-oriented stores RDF stores Wide column stores Real Synthetic http://lod-cloud.net/versions/2017-02-20/lod.pn g LOD Cloud 2017 Motivation 2
  4. 4. 4Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn • Domain specific applications: i.e. perspectives • Choice Overload! • Vendors • Researchers • Users https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building Motivation
  5. 5. 5Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Benchmarking ● Tedious! ● Needs domain-specific expertise ● Lack of standardization (single focus) ○ Open software, System configuration settings, etc. ● Near-zero Reusability ● Guaranteeing a fair benchmark is difficult! ● Choosing the right performance metrics is cumbersome and subjective ● Visualising benchmark results [6] http://2.bp.blogspot.com/-TkUb0TPN7IA/VewUHm_jVaI/AAAAAAAABgM/vZILnZNJv5A/s1600/2012-10-16-subjective-objective.jpg
  6. 6. 6Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Problem Statement “How can diverse cross-domain DMSs be benchmarked in an automated established * standard # environment?”
  7. 7. 7Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn State of the Art Benchmark Effort Relational DMSs RDF DMSs Graph DMSs TPC [H,C,E,DS] [13] XGDBench [6] HPC [7] Graph 500 [12] DBPSB [11] LUBM [9] IGUANA [19] WatDiv [1] SP2Bench [20] BSBM [4] Pandora* Graphium [8] LDBC [2] HOBBIT** *http://pandora.ldc.usb.ve/ Single domain Benchmarks Cross domain Benchmarks **https://project-hobbit.eu/
  8. 8. 8Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn LITMUS Benchmark Suite
  9. 9. 9Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Dataset 1 Dataset 2 Dataset 3 Dataset N Data integration module Benchmarking Core Controller & Tester System configuration & integration module Queryset 1 Queryset 3 Queryset M Analyzer RDF stores Graph stores Relational DBs Wide Column stores Profiler Queryset 2 Key value stores Queryconversion module Query Facet (F2) Data Facet (F1) System Facet (F3) User Interface (F4) User The LITMUS architecture Thakkar, Harsh. "Towards an Open Extensible Framework for Empirical Benchmarking of Data Management Solutions: LITMUS." ESWC, 2017.
  10. 10. 10Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Challenges ● Core challenges in developing such an open, extensible, FAIR framework? ○ C1 - Data Conversion ○ C2 - Query Translation ○ C3 - Key Performance Indicators (KPIs) http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp g
  11. 11. 11Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C1 - Data Conversion ● Different data models ○ RDF Graph ○ Property Graph ● To conduct a fair benchmark conversion is needed ● DMS’s native supported data model is the best RDF graph Property graph Lots of Data Real Synthetic RQ1 - What are the methods to convert RDF into Property Graph data model?
  12. 12. 12Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF Data Model ● RDF is a triple based graph model, where : ○ Subject: URI, Blank node ○ Predicate: URIs -> property ○ Object: URI, Literal, Blank node “2017” ex:Eventex:Person ex:AMS “Semantics” ex:year ex:name ex:place ex:speaker URI = Universal Resource identifier, analogous to ISBN for books Literals = data values Blank nodes = Desc. of entities that don’t need to be named. IRIs* ex:stim e “30” @prefix ex: <http://example.org> ex:Person ex:speaker ex:Event ex:Person ex:name “Harsh” ex:Person ex:place ex:Bonn ex:Person ex:age “27” ex:Event ex:name “Graph Day” ex:Event ex:Year “2017” interpretation representation “Harsh” ex:name ex:place ex:Bonn “27” ex:age
  13. 13. 13Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF Graphs (RDFGs) ● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals) ● Going from information to Knowledge using OWL (DLs) and Ontologies (RDFS, RDFa, etc) ● Bulky ○ Everything is a node-edge-node (edges dont have properties) ○ More relationships per node → More total number of triples!
  14. 14. 14Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Property Graph Data Model ● Edge-labelled, directed, attributed, multi-graph ● Vertices and edges both have properties ● Main components: ○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings) ● Super neat (compact), super cute ● Easier to add weighted, reified edges ● Query Languages - CYPHER, Gremlin, PGQL, etc Name: Semantics Year: 2017 Place: AMS Name: Harsh Age: 27 Place: Bonn Role: speaker Time: 30 Person Event
  15. 15. 15Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Mapping RDF → PG ● Initial Results: ○ Intra-conversion of graph data models (mapping problem) ○ PoC implementation ready (see GitHub) ● Work in progress: ○ Conversion of properties, blank nodes, etc. ○ Using e.g. Reification, Singleton Property, Hypergraphs, etc. ○ Use case: DBpedia 2016-10 (mapping from .owl & data)
  16. 16. 16Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C2 - Query Translation ● Yes we are linguistically diverse and so are DMSs! ● That too with different dialects: ○ SPARQL, CYPHER, Gremlin, etc ● RDF - SPARQL (W3C ‘08) ● Graph - ?? http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
  17. 17. 17Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Gremlin Traversal Language http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png Gremlin’s Multi-Graph Query Language (GQL) support
  18. 18. 18Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Contd… Multi-DMS & platform support https://tinkerpop.apache.org/images/oltp-and-olap.png RQ2 - What are the semantics preserving methods/approaches for translating SPARQL queries to a graph query language such as Gremlin?
  19. 19. 19Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j pg Gremlinator Me
  20. 20. 20Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn SPARQL → Gremlin ● C2: Gremlinator - the SPARQL-Gremlin translator ○ Formalizing Gremlin traversals in Graph algebra [DEXA ‘17] ○ A novel translation mechanism that maps SPARQL queries to Gremlin pattern matching traversals [Planned submission - EDBT’18] ○ Nested queries still a challenge (i.e. UNION) Addressing RQ2 Talk@Graph Day 2017
  21. 21. 21Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C3 - Metrics/KPIs RQ3 - What are the strengths and the limitations of the existing KPIs, and to what extent do they reflect the performance of a DMS? RDF graph Property graph Type of Data Real Synthetic [11] https://www.tutorialspoint.com/computer_fundamentals/images/primary_memory.jpg [12] http://s.hswstatic.com/gif/microprocessor-250x150.jpg 11 Query response time Precision, Recall DMS Index size DMS configuration Linear Star shaped Snowflake Type of Query
  22. 22. 22Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Selection of KPIs ● CPU and Memory specific metrics: ○ Perf-tool - LITMUS v0.1 (supported) ■ TLB, LLC, instructions, L1 cache, page faults, etc (18 supported currently) ● Dataset specific metrics: ○ |V|, |E|, Eccentricity, Clustering coefficient, Centrality, etc (in progress) ● Query specific metrics: ○ Type, Length, Response time, Precision, Recall, F1, etc (planned) ● DMS specific: ○ Load time, index time, index size (supported)
  23. 23. 23Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Dataset 1 Dataset 2 Dataset 3 Dataset N Data integration module Benchmarking Core Controller & Tester System configuration & integration module Queryset 1 Queryset 3 Queryset M Analyzer RDF stores Graph stores Relational DBs Wide Column stores Profiler Queryset 2 Key value stores Queryconversion module Query Facet (F2) Data Facet (F1) System Facet (F3) User Interface (F4) User Back in the bigger picture C1 C2 C3
  24. 24. 24Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn The LITMUS Test* PLOTS FILES *Please visit our Poster & Demo for Hands on experience & more details in the paper!
  25. 25. 25Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Evaluation ● RQs: Publications ● Framework: Continuous integration (v0.1 released, v0.2 planned Dec ‘17) ○ Reproducing third-party benchmarks ○ Gathering users and experts feedback ○ Going live @Industry: ■ Gremlinator - Apache Tinkerpop ■ Further collaboration… Adoption by other projects - LDBC, HOBBIT! :-)
  26. 26. 26Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Next Steps ● Framework - LITMUS v0.2 launch (Dec ‘17 - planned) ● DMS module - Adding two more DMSs each ● Dataset module - RDF → PG (Dec ‘17) ● Query module - Integrating Gremlinator ● GUI: Aesthetic GUI (may be?)
  27. 27. 27Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Acknowledgements Funding: Supervisors & Mentors: Prof. Dr. Soeren Auer TiB, DE Prof. Dr. Jens Lehmann UBO, DE Prof. Dr. Maria-Esther Vidal TiB, DE H2020 WDAqua ITN (GA: 642795) Dr. Marko Rodriguez DataStax & Apache, USA
  28. 28. 28Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Resources http://wdaqua.eu/ https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin Code : https://github.com/LITMUS-Benchmark-Suite/ Web : https://litmus-benchmark-suite.github.io Docker : https://hub.docker.com/r/litmusbenchmarksuite/litmus/ LITMUS Benchmark Suite
  29. 29. THANK YOU ! Harsh Thakkar University of Bonn Twitter: @harsh9t LinkedIn: thakkarharsh E-mail: harsh9t@gmail.com Questions? Comments? Insults? Injuries?
  30. 30. 30Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn EXTRA STUFF
  31. 31. 31Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Experiments* Northwind dataset ● PG - Vertices: 3209, Edges: 6177 ● RDF - Triples: 33033 BSBM 1M dataset ● PG - Vertices: 92737, Edges: 238309 ● RDF - Triples: 1000313 CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz), RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64) Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3

×