GraHPEr: Graph queries
on relational data
Luis Vaquero, Marco Lotz, James Brook, Joan Varvenne,
Suksant Sae Lor, David Subiros, Herry Herry, Brian Monahan
March 2016
“Ma’ Look! Graph
Analytics without
Graphs!!!!
June 2016
CC: https://www.youtube.com/watch?v=CxKOSAtMC1g
CC: https://www.youtube.com/watch?v=CxKOSAtMC1g
CC by adeevee
CC by Ole Rinnan.
http://www.vg.no/forbruker/bil-baat-og-motor/bil-og-trafikk/post-it-feberen-brer-seg/a/165769/
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action
The Case for Graph Analytics on Relational Databases
• Lots of data sitting in relational databases (accumulated over the last few decades)
• Some data are simply too bulky to move around
• Consistency / Cascading issues slow down write throughput (key in big data apps)
• Simple graph syntax and semantics to build our queries
Relational Data as Graphs: Problems
1. Raw SQL or stored procedures on relational DBs (“monster SQL queries”)
2. Copy data from its original source to construct a new graph (duplication)
from Pixabay under CC
by Chris Downer under CC
from Pixabay under CC
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action
Graph syntax/semantics
on relational DBs
without duplication
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action
Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema
Set of Graph Topologies
Graph Topology Cypher Query
Equivalent SQL Query
Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema
Set of Graph Topologies
Graph Topology Cypher Query
Equivalent SQL Query
Relational Tables
Id Title Released Tagline
01 Matrix 1999 Enter the Matrix
Id Name Born
01 Keanu Reeves 1964
person_id Movie_id
01 02
Person_id Movie_id Role
01 01 Neo
person_id Movie_id
01 03
Movie
Person
Directed Produced
Acted In
The Equivalent Graph Topology (Gtop)
Movie
Properties:
 ID
 Title
 Released
 Tagline
Person
Properties:
 ID
 Name
 Born
Acted in
Attributes: Role
Produced
Attributes: None
Directed
Attributes: None
By default, an entity-relationship diagram
Advanced ML enables finding different graphs in the data
Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema
Set of Graph Topologies
Graph Topology Cypher Query
Equivalent SQL Query
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
Query Processor
ParserMATCH (m:Movie) RETURN m.title
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Match - false
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m - movie
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m - movie
Return
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m - movie
Return
ReturnItem
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m - movie
Return
ReturnItem – m
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m))
,List(LabelName(movie))
,None)
)
)
)
,List()
,None)
,Return(false,
ReturnItems(false
,List(
UnaliasedReturnItem(
Property(
Variable(m)
,PropertyKeyName(title)
)
,m.title
)
))
,None,None,None)
)
)
)
Pattern
Match - false
NodePattern – m - movie
Return
ReturnItem – m - title
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
Pattern
Match - false
NodePattern – m - movie
Return
ReturnItem – m - title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
ReturnItem – m - title
Template Matcher SQL Templates
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem – m - title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem – m - title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem – m - title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem – m - title
@*
This is a template for the Return cause of Cypher language.
*@
@args List returnItems
@args boolean distinct
@for(Map properties: returnItems) {@properties.get("property")
@if(!properties_isLast){,}} HPE Confidential
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher
Gtop: "implementationLevel" : {
"implementationNodes": [
{
"synonyms": ["movie"],
"tableName": "Movie",
"id" : [
{
"columnName": "id",
"dataType": "INTEGER"
}
]
}
]
}
Query Processor SQL Builder
Template Matcher SQL Templates
FROM Movie
SELECT m.title
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
I/O
MATCH (p: Person)-[:person_id_acted_in_person_id]->(m: Movie) RETURN p.name, m.title
Input Cypher Query:
Expected output SQL:
SELECT p.name, m.title
FROM person AS p
JOIN acted_in ON (acted_in.person_id = p.id)
JOIN movie AS m ON (m.id = acted_in.movie_id)
It doesn’t stop there!
MATCH (p: Person) --> (m) return m
Input Cypher Query:
Expected output SQL: >>>>>>>>>
Hidden SQL Monsters
SELECT 'movie['||m.id||']' AS m
FROM person AS p
JOIN directed ON (directed.person_id = p.id)
JOIN movie AS m ON (m.id = directed.movie_id)
UNION ALL
SELECT 'movie['||m.id||']' AS m
FROM person AS p
JOIN acted_in ON (acted_in.person_id = p.id)
JOIN movie AS m ON (m.id = acted_in.movie_id)
UNION ALL
SELECT 'movie['||m.id||']' AS m
FROM person AS p
JOIN produced ON (produced.person_id = p.id)
JOIN movie AS m ON (m.id = produced.movie_id)
It doesn’t stop there!
MATCH (keanu: Person { name: 'Keanu Reeves' }) --> (m: Movie {released: '1999'}) return m
Input Cypher Query:
Expected output SQL: >>>>>>>>>
Hidden SQL Monster
SELECT 'movie['||m.id ||']' AS m
FROM person AS keanu
JOIN directed ON ( directed.person_id = keanu.id )
JOIN movie AS m ON ( m.id = directed.movie_id )
WHERE keanu.name =  "keanu reeves" AND m.released =  "1999"
UNION ALL
SELECT 'movie['||m.id||']' AS m
FROM person AS keanu
JOIN acted_in ON ( acted_in.person_id = keanu.id )
JOIN movie AS m ON ( m.id = acted_in.movie_id )
WHERE keanu.name =  "keanu reeves" AND m.released =  "1999"
UNION ALL
SELECT 'movie['||m.id||']' AS m
FROM person AS keanu
JOIN produced ON ( produced.person_id = keanu.id )
JOIN movie AS m ON ( m.id = produced.movie_id )
WHERE keanu.name =  "keanu reeves" AND m.released =  "1999"
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action
Vendor Landscape
Market largely dominated by Neo4J3 -> This is why we chose Cypher as our query language
Me too: JSON databases are jumping into the space by enabling links between docs with
properties associated (e.g. ArangoDB)
Trends towards multimodal DB (OrientDB, DataStax, )
Consolidation: Experian acquired 4Store (now for internal use only) and, DataStax has acquired
Aurelius (Titan graph database).
1. https://en.wikipedia.org/wiki/Oracle_Spatial_and_Graph
2. http://www.teradata.com/SQL-GR-Engine
3. http://zion-city.blogspot.co.uk/2012/05/graphdb-market-share.html
* Find a more detailed comparison in the two last backup slides below
Vendor/Research Landscape
Simple
Queries
No Data
Duplication
Teradata SQL-GR
RapidGrapher
Oracle Spatial&Graph
IBM Graph
Neo4J
GraHPEr
Names in bold blue indicate products
Names on black font indicate research
IBM’s SQLGraph
GraphGen
Stanford’s Ringo
Spark’s GraphFrames
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action
Cypher Language Coverage
Cypher clause Supported Details
Return Y
Order by Y
Limit Y
With Y http://wes.skeweredrook.com/the-mythical-with-neo4js-
cypher-query-language/
Skip N Can be implemented as a post-processing stage
Union N Current GraHPEr syntactic parser to split query in two
Unwind N No support for in-query collection/function handling yet
Using N Hint neo to use “right” index
General Clauses
50% of general clauses implemented
25% are easy to implement with minimum effort based on our current code base
12.5% require us to invest time in in-query collection/function processing
12.5% are neo4j specific
Reading Clauses
68% of read clauses implemented
20% are easy to implement with minimum effort based on our current code base
8% require us to invest time in in-query collection/function processing or build a REP
4% are for use with legacy indices in neo4jCypher clause Supported Details
Match by id Y
Match by type Y
Match by rel patter Y
Match by multiple types Y
Match multiple relationships Y
Match variable length relationships Y
Match anonymous edges and nodes Y
Match zero-path length Y
Where Y
Where on property Y
Where on label Y
Where patterns Y MATCH (n)WHERE (n)-[:KNOWS]-({ name:'Tobias' })RETURN n
Where range Y
Count Y
Distinct Y
Sum, avg, max, min Y
Case Y
Optional match N the Cypher equivalent of the outer join in SQL
Match rels with uncommon chars N
Where with string matching N
Where with regexes N
Percentile, std N can be implemented as a post-processing stage
Where on dynamic property N Requires REPL like utility
Where collection patterns N (partial) MATCH (tobias { name: 'Tobias' }),(others)WHERE others.name IN
['Andres', 'Peter'] AND (tobias)<--(others) RETURN others
Start N Deprecated/legacy usage. No plans to support.
Cypher Language Coverage
Cypher Language Coverage
Cypher clause Supported Details
ALL/ANY/NONE/SINGLE/E
XISTS
SIZE on collection
SIZE on pattern
LENGTH on collection
LENGTH on pattern
TYPE
Id
COALESCE
HEAD/LAST
Timestamp
Startnode / Endnode
Toint / Tofloat
Nodes
Relationships
Labels
Keys
Extract (map)
Filter
Tail
Range
Reduce
Math functions
String functions
Functions
Cypher Language Coverage
No Writing Support
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and call to action
Summary
For: customers with large RDBMs deployments
Who: would like to do some graph analytics (multi modal)
without migrating massive amounts of data to other platform
GraHPEr
Provides: read-only easy installation library
discovery of graphs in relational data
to query relational data in a graphy way
without data duplication or cascade effects
single-system administration
Unlike: solutions that need data to be copied and adapted to a graph format
or expose complex / verbose graph functions as stored procedures
GraHPEr: Unique Selling Points
• Storage and transfer time savings
o No data duplication
o No separate system to manage
• Easy to install / minimally intrusive -> multimodal DBs made easy
o Just a read-only library on top of existing DB deployments
• Declarative graph query language (compatibility with the market leader, Neo4J1)
o Tap on large existing communities / reuse current code
1. http://neo4j.com/top-ten-reasons/
Thank you
Luis M. Vaquero
Hewlett Packard Enterprise
Contact: luis.vaquero@hpe.com
Graph Analytics
Not just startups in Sillicon-Valley: the Lufthansas, Walmarts, the USBs, and the AT&Ts too
ScriptHop: A motion-picture is graph among interconnected stakeholders, including producers,
directors, casting agents, cinematographers, actors, and so on.
Determine scripts with characters whose particular attributes (such as minorities) make them likely
to require loots of screen time, which might be excessively costly and time-consuming to produce
ORiGAMI – Oak Ridge Graph Analytics for Medical Innovation
http://www.forbes.com/sites/danwoods/2015/12/29/why-graph-technology-is-ready-for-its-close-up-in-2016
Quick Figures
• 1% market penetration today
• Forrester Research: it will reach over 25 percent of all enterprises by 2017
• Popular tools:
o GraphConnect (Neo4J, SF’15):
 more than 1000 developers
 more than 350 organisations
o 1000000+ downloads
o 124 contributors
o 36500 commits
Performance, Really?
“Relational DBs have 40 years of success behind them”
http://istc-bigdata.org/index.php/benchmarking-graph-databases/
HPE Confidential
Vendor Landscape
Vendor Date License Model Query Language
Complexible 2012 Commercial RDF SPARQL
DataStax 2011 Open
Source
Property Gremlin
FlockDB 2010 Open
Source
Property Java
Franz (AllegroDB) 2005 Dual RDF SPARQL, RDFS++, OWL2-RL, Prolog
Neo4J 2007 Open
Source
Property Cypher,
native API,
TinkerPop
Objectivity 2011 Commercial Objects Java
Oracle 2015 Commercial Property Java, Gremlin, Groovy, Python
Orient Tech 2011 Open
Source
Property REST, Gremlin, SPARQL, SQL
Informatica 2015 Commercial RDF SPARQL
Ontotext/GraphDB 2000 Commercial RDF SPARQL
Teradata SQL-GR 2015 Commercial Relational SQL
IBM Graph 2015 Dual Property Gremlin
Actian 2014 Commercial RDF SPARQL
MarkLogic 2015 Commercial RDF SPARQL
ArangoDB Commercial Property AQL, Blueprints
Graph to SQL
• Plenty of tools converting from SQL to graph languages.
• We want the opposite: Graph to SQL
Feature SQLGraph (IBM) GraphiQL (MIT) GraHPEr (HPE)
Language non-side-effecting Gremlin
to SQL compilation
Pig-Latin inspired new
declarative language
compiled into SQL
OpenCypher (with time
extensions) compiled to SQL
SQL Exploits recursive/iterative
queries
Exploits recursive/iterative
queries
ANSI92 with Vertica-friendly
optimisations
Additional
tables
Created relational tables (to
represent edges and
nodes)
Separate GraphTables Maximise reuse of existing tables
Integration
with pre-
existing
installations
Requires migration Requires migration No migration
Type of
analysis
Bulk Bulk Time-based
Benchmarks Large-scale Mid-scale (SNAP data) TBD (goal is large-scale, but time
constraints are key)

Cypher to SQL online mapper