Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large Scale Graph Analytics with RDF and LPG Parallel Processing

342 views

Published on

Analytics that traverse large portions of large graphs have been problematic for both RDF and LPG graph engines. In this webinar Barry Zane, former co-founder of Netezza, Paraccel and SPARQL City and current VP of Engineering at Cambridge Semantics, discusses the native parallel-computing approach taken in AnzoGraph to yield interactive, scalable performance for RDF and LPG graphs.

Published in: Technology
  • Be the first to comment

Large Scale Graph Analytics with RDF and LPG Parallel Processing

  1. 1. Large Scale Graph Analytics with RDF and LPG Parallel Processing Presenter: Barry Zane, VP Engineering
  2. 2. ©2018 Cambridge Semantics Inc. All rights reserved. Agenda •Quick Review of RDF/SPARQL •International W3C standard •Making RDF/SPARQL a full Property Graph database •Now better than other Property Graph approaches •OLAP systems complimentary to OLTP systems •One size does not fit all - you need OLTP for end-user transactions, OLAP for analytics •Features needed in OLAP systems not found in OLTP systems •Big data is about inferences and aggregate insights
  3. 3. ©2018 Cambridge Semantics Inc. All rights reserved. Quick Review of W3C Standard RDF/SPARQL 1.1 •RDF - Resource Description Framework •Atomic “triples” describe anything •Subject, Predicate, Object •<Jack> <wentUp> <TheHill> •SPARQL - SPARQL Protocol and RDF Query Language •Declarative, like SQL •As expressive as SQL •Can do things SQL can’t do •Multi-Graph • Could use more SQL-like Analytics • Could use more Graph Algorithms •Needs a better way to describe properties of edges!
  4. 4. ©2018 Cambridge Semantics Inc. All rights reserved. A Basic Graph 3 Vertexes, 3 Edges friendOf wentUp isA: <Man> birthday: 09/17/1975 Jack isA: <Woman> Jill TheHill isA: <Place> has: Water has: Trees partOf: <TheMountain> wentUp Subject Predicate Object <Jack> <isA> <Man> <Jack> <Birthday> “09/17/1975”^^date <Jack> <friendOf> <Jill> <Jack> <wentUp> <TheHill> <Jill> <isA> <Woman> <Jill> <wentUp> <TheHill> <TheHill> <isA> <Place> <TheHill> <has> “Water” <TheHill> <has> “Trees” <TheHill> <partOf> <TheMountain> <Man> <subClassOf> <Human> <Man> <gender> “Male” <Woman> <subClassOf> <Human> <Woman> <gender> “Female” <friendOf> <property> “Reflexive” <Human> <subClassOf> <Animal> Everything is an atomic Triple that specify properties on Entities
  5. 5. ©2018 Cambridge Semantics Inc. All rights reserved. Let’s Add Edge Properties friendOf metAt=<TheHill> metDate=07/04/2018 wentUp date=‘today’ isA: <Man> birthday: 09/17/1975 Jack isA: <Woman> Jill TheHill isA: <Place> has: Water has: Trees partOf: <TheMountain> wentUp date=‘today’ Since triples specify properties on Vertexes, how to we specify properties on Edges… … how do we specify properties on triples?
  6. 6. ©2018 Cambridge Semantics Inc. All rights reserved. The “Reification” Approach to Properties - Don’t Do This! •Mathematically Pure •Hard to Write Queries •Wastes Storage •Runs Slowly Subject Predicate Object <Jack> <isA> <Man> <Jack> <Birthday> “09/17/1975”^^date <Jack> <friendOf> <Jill> <Jack> <wentUp> <TheHill> Subject Predicate Object <Jack> <isA> <Man> <Jack> <Birthday> “09/17/1975”^^date <Relationship1> <subject> <Jack> <Relationship1> <predicate> <friendOf> <Relationship1> <object> <Jill> <Relationship1> <metAt> <TheHill> <Relationship1> <metDate> “07/14/2018” <Activity1> <subject> <Jack> <Activity1> <predicate> <wentUp> <Activity1> <object> <TheHill> <Activity1> <date> “today” We’ve turned verbs “likes” and “wentUp” into noun-sets “Relationship1” and “Activity1” !!! In other words, these edges become vertexes.
  7. 7. ©2018 Cambridge Semantics Inc. All rights reserved.
  8. 8. ©2018 Cambridge Semantics Inc. All rights reserved. RDF* (aka Reification Done Right) and SPARQL* friendOf metAt=<TheHill> metDate=07/04/2018 wentUp date=‘today’ isA: <Man> birthday: 09/17/1975 Jack isA: <Woman> Jill TheHill isA: <Place> has: Water has: Trees partOf: <TheMountain> wentUp date=‘today’ Subject Predicate Object <Jack> <isA> <Man> <Jack> <Birthday> “09/17/1975”^^date <Jack> <friendOf> <Jill> <<<Jack> <friendOf> <Jill>>> <metAt> <TheHill> <<<Jack> <friendOf> <Jill>>> <metDate> “07/04/2018”^^date <Jack> <wentUp> <TheHill> <<<Jack> <wentUp> <TheHill>>> <date> “today”^^date <Jill> <isA> <Woman> <Jill> <wentUp> <TheHill> <<<Jill> <wentUp> <TheHill>>> <date> “today”^^date <TheHill> <isA> <Place> <TheHill> <has> “Water” <TheHill> <has> “Trees” <TheHill> <partOf> <TheMountain> <Man> <subClassOf> <Human> <Man> <gender> “Male” <Woman> <subClassOf> <Human> <Woman> <gender> “Female” <likes> <property> “Reflexive” <Human> <subClassOf> <Animal>
  9. 9. ©2018 Cambridge Semantics Inc. All rights reserved. Even More Powerful Than Other Property Graph Approaches isA: <Man> birthday: 09/17/1975 AccordingTo: MotherGoose birthday: 06/12/1975 AccordingTo: Aesop birthday: 06/12/1975 AccordingTo: Grimm Jack Subject Predicate Object <Jack> <isA> <Man> <Jack> <birthday> “09/17/1975”^^date <Jack> <birthday> “06/12/1975”^^date <<<Jack> <birthday> 09/17/1975>> <accordingTo><MotherGoose > <<<Jack> <birthday> 06/12/1975>> <accordingTo><Aesop> <<<Jack> <birthday> 06/12/1975>> <accordingTo><Grimm> Properties, such as provenance, can be applied to Vertexes, not just edges!!!
  10. 10. ©2018 Cambridge Semantics Inc. All rights reserved. RDF*/SPARQL* Acknowledgements •Extends RDF and SPARQL - https://www.w3.org/TR/sparql11-query/ •Specification and Model Created by: •Olaf Hartig, Linköping University, Sweden •Bryan Thompson •First implemented by Blazegraph •https://arxiv.org/pdf/1406.3399.pdf •https://arxiv.org/pdf/1409.3288.pdf •AnzoGraph is the first OLAP-oriented Graph Database to use this approach for handling rich data
  11. 11. ©2018 Cambridge Semantics Inc. All rights reserved. Inferences - RDFS, RDFS+, OWL friendOf wentUp isA: <Man> birthday: 09/17/1975 gender: Male isA: Human Jack isA: <Woman> gender: Female isA: Human Jill TheHill isA: <Place> has: Water has: Trees partOf: <TheMountain> wentUp Everything is an atomic Triple that specify properties on Entities friendOf •Automatically adds new triples. •Triggered and controlled by other triples in the dataset •RDFS - Basic Inferences •RDFS+ - Popular Extensions •OWL - Most complete •Used for: •Simplifying queries •Harmonizing datasets •Formalizing relationships •Keeps load files normalized •Denormalization
  12. 12. ©2018 Cambridge Semantics Inc. All rights reserved. How OLAP Adds to the OLTP World •Key functionality •Windowed Aggregates •CUBE, ROLLUP, Grouping Sets •80+ added scalar functions and aggregates •Named Queries for re-use of prior work •Views for re-use and providing alternate perspectives •Native, Parallel Processing for Query Performance •“Tell me about populations and trends” •Not just “Tell me about Jack”
  13. 13. ©2018 Cambridge Semantics Inc. All rights reserved. Where the rubber meets the road… •Let’s write some queries!!! •Anzo customers generally use the “point and click” interface of Anzo HiRes to automatically write their queries. •AnzoGraph customers write SPARQL* queries, so let’s do so… •There will not be a quiz.
  14. 14. ©2018 Cambridge Semantics Inc. All rights reserved. Example OLTP Queries (1) SELECT $person $place WHERE { $person <wentUp> $place } person | place --------+--------- Jack | TheHill Jill | TheHill Who went up places? (generic search) How are Jack and Jill related? (OLTP) SELECT $relationship WHERE { <Jack> $relationship <Jill> } relationship -------------- friendOf
  15. 15. ©2018 Cambridge Semantics Inc. All rights reserved. Example OLAP Queries (2) What are the most common relationships among people? (OLAP, Inference) SELECT $relationship (COUNT(*) AS $count) WHERE { $person1 <isA> <Human> . $person2 <isA> <Human> . $person1 $relationship $person2 } GROUP BY $relationship ORDER BY DESC($count) LIMIT 10 relationship | count --------------+------- friendOf | 2 What are the most popular dates to go places? (OLAP, RDF*) date | count | place ------------+-------+--------- 2018-09-15 | 2 | TheHill SELECT $date (COUNT(*) AS $count) $place WHERE { << $person <wentUp> $place >> <date> $date } GROUP BY $place $date ORDER BY DESC($count) LIMIT 10
  16. 16. ©2018 Cambridge Semantics Inc. All rights reserved. Example Graph Algorithm Query (3) Who are the most popular people? (PageRank) SELECT $person $rank FROM <tickit> WHERE { SERVICE <csi:page_rank> { [] <csi:binding-vertex> $person; <csi:binding-rank> $rank; <csi:edge-label> <friend>; <csi:max-iterations> 20; } } ORDER BY DESC($rank) $person LIMIT 5 person | rank -------------+---------- person23501 | 1.807055 person16101 | 1.748871 person3083 | 1.746725 person19289 | 1.736759 person36749 | 1.731948
  17. 17. ©2018 Cambridge Semantics Inc. All rights reserved. What Did We Learn? •RDF/SPARQL is an International W3C Standard that describes how Graph Databases should be managed and queried. •The RDF*/SPARQL* extensions to the standard make RDF/SPARQL databases even more capable. •One size does not fit all - you need OLTP for end-user transactions, OLAP for analytics •OLAP database systems require capabilities irrelevant to OLTP databases.
  18. 18. ©2018 Cambridge Semantics Inc. All rights reserved. Next Step: Download free 60-day trial at AnzoGraph.com
  19. 19. ©2017 Cambridge Semantics Inc. All rights reserved. Questions?

×