Uploaded on

3 Myths about Graph Query Languages, Busted by Pixy

3 Myths about Graph Query Languages, Busted by Pixy

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
808
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
14
Comments
0
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 3 Myths about graph query languages Busted by Pixy Sridhar Ramachandran Founder, LambdaZen LLC
  • 2. Background ● Graph databases are a category of NoSQL databases that model graphs consisting of vertices and edges. ○ The property-graph model from Tinkerpop is a graph database standard. ○ It offers a common abstraction for over a dozen graph databases using the Blueprints API. ● There are two querying paradigms for graph DBs, viz. graph query languages (GQL) and graph traversal languages (GTL). ○ GQLs are declarative and constraint-driven. ○ GTLs are imperative and step-driven.
  • 3. Background ● The Tinkerpop software stack includes Gremlin, a graph traversal language (GTL) that is a monadic Groovy DSL. ● All other GQLs to date are proprietary and can not be ported across graph databases. ● Pixy is a new declarative graph query language (GQL) that works on any Blueprints-compatible graph database. ○ Project page: https://github.com/lambdazen/pixy/ ○ Available under the Apache 2.0 license
  • 4. Myth #1: GQLs and GTLs can’t mix ● Myth #1: Graph Query Languages and Graph Traversal Languages are totally different ways to look at the graph query problem. ● Common wisdom dictates that: ○ A graph “access” language must either be a GTL or a GQL. ○ The programmer must choose one paradigm or the other for a specific query.
  • 5. Pixy co-exists with Gremlin ● Pixy queries are run from Gremlin expressions using the ‘pixy’ step. ● The input and output to the query can be operated on by Gremlin. ● The programmer can use both paradigms in the same query using Pixy + Gremlin. Gremlin (GTL) GremlinPixy (GQL)
  • 6. Myth #2: GQLs are slower ● Myth #2: Graph Query Languages are much slower than Graph Traversal Languages because of their declarative nature. ● Common wisdom dictates that: ○ the performance penalty is the price paid for declarative expressiveness. ○ you can’t be sure about the execution plan of a query written in a GQL, as it is with SQL.
  • 7. Pixy compiles to Gremlin ● Pixy compiles PROLOG-style rules to Gremlin expressions. ● The execution plan is a Gremlin pipeline and can be tweaked by reordering the clauses. ● Performance should be the same in most cases.
  • 8. Myth #3: GQLs can’t be relational ● Myth #3: A graph query language can not be based on N-ary predicate-calculus or relational algebra, since graphs can only express binary relations/predicates. ● Common wisdom dictates that: ○ Graph-based models unlike relational models can only capture binary relationships in edges. ○ Therefore, GQLs can only operate on vertices and edges, not N-ary relations. ○ HypergraphDB is designed to support “hyper” edges across N vertices to address this perceived weakness with graph-based associative models.
  • 9. Pixy derives N-ary relations ● The property graph model can only capture binary relations between vertices in a graph, aka edges. ● But Pixy can derive N-ary relations across vertices, edges and properties. ● These relations can be used to derive other N-ary relations. ○ These relations form what is called the “domain model” for the graph. ○ When any relation is queried, Pixy compiles the query into a sequence of Gremlin steps.
  • 10. An example gremlin> pt = new PixyTheory( '''father(Child, Father) :- out(Child, 'father', Father).''') The above rule means that: - father(A, B), father(B, C), father(D, B) are all true. - father(A, C), father(D, E), etc. are false. gremlin> pt = pt.extend( '''grandfather(X, Y, Z) :- father(X, Y), father(Y, Z).''') The above rule means that: - grandfather(A, B, C) and grandfather(D, B, C) are true - All other combinations are false Sample query from the Pixy Tutorial
  • 11. Wrap-up ● Pixy is a declarative graph query language that dispels 3 myths about GQLs. ● Myth #1: GQLs and GTLs can’t mix. ○ Pixy’s querying capability is integrated into Gremlin, bringing the capabilities of both querying paradigms in one combined language. ● Myth #2: GQLs are slower than GTLs. ○ Pixy compiles PROLOG-based queries and rules to Gremlin expressions. ● Myth #3: GQLs can’t be relational. ○ Pixy can derive N-ary relations from graphs. ○ New relations can be derived from existing ones.