Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GraHPEr: Graph queries
on relational data
Luis Vaquero, Marco Lotz, James Brook, Joan Varvenne,
Suksant Sae Lor, David Sub...
“Ma’ Look! Graph
Analytics without
Graphs!!!!
June 2016
CC: https://www.youtube.com/watch?v=CxKOSAtMC1g
CC: https://www.youtube.com/watch?v=CxKOSAtMC1g
CC by adeevee
CC by Ole Rinnan.
http://www.vg.no/forbruker/bil-baat-og-motor/bil-og-trafikk/post-it-feberen-brer-seg/a/165769/
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and cal...
The Case for Graph Analytics on Relational Databases
• Lots of data sitting in relational databases (accumulated over the ...
Relational Data as Graphs: Problems
1. Raw SQL or stored procedures on relational DBs (“monster SQL queries”)
2. Copy data...
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and cal...
Graph syntax/semantics
on relational DBs
without duplication
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and cal...
Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema...
Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema...
Relational Tables
Id Title Released Tagline
01 Matrix 1999 Enter the Matrix
Id Name Born
01 Keanu Reeves 1964
person_id Mo...
The Equivalent Graph Topology (Gtop)
Movie
Properties:
 ID
 Title
 Released
 Tagline
Person
Properties:
 ID
 Name
 ...
Relational GraHPEr
Query ProcessorGraph Schema Extractor
2 Related (but Independent) main functionalities:
Database Schema...
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
Query Processor
ParserMATCH (m:Movie) RETURN m.title
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern...
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
Query(None,
SingleQu...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor Visitor
Query(None,
SingleQuery(
List(
Match(false,
Pattern(
List(
EveryPath(
NodePattern(
Some(Variable(m...
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
Pattern
Match - fals...
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
ReturnItem – m - title
Template Matcher S...
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem...
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem...
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem...
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Return
Template Matcher SQL Templates
ReturnItem...
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher SQL Templates
SELECT m.title
Query Processor SQL Builder
Pattern
Match - false
NodePattern – m - movie
Template Matcher
Gtop: "implementationLevel" : {...
Query Processor SQL Builder
Template Matcher SQL Templates
FROM Movie
SELECT m.title
Query Processor
Parser
MATCH (m:Movie) RETURN m.title
SELECT m.title FROM Movie
Visitor
Query Builder
I/O
MATCH (p: Person)-[:person_id_acted_in_person_id]->(m: Movie) RETURN p.name, m.title
Input Cypher Query:
Expected outp...
It doesn’t stop there!
MATCH (p: Person) --> (m) return m
Input Cypher Query:
Expected output SQL: >>>>>>>>>
Hidden SQL Monsters
SELECT 'movie['||m.id||']' AS m
FROM person AS p
JOIN directed ON (directed.person_id = p.id)
JOIN mov...
It doesn’t stop there!
MATCH (keanu: Person { name: 'Keanu Reeves' }) --> (m: Movie {released: '1999'}) return m
Input Cyp...
Hidden SQL Monster
SELECT 'movie['||m.id ||']' AS m
FROM person AS keanu
JOIN directed ON ( directed.person_id = keanu.id ...
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and cal...
Vendor Landscape
Market largely dominated by Neo4J3 -> This is why we chose Cypher as our query language
Me too: JSON data...
Vendor/Research Landscape
Simple
Queries
No Data
Duplication
Teradata SQL-GR
RapidGrapher
Oracle Spatial&Graph
IBM Graph
N...
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and cal...
Cypher Language Coverage
Cypher clause Supported Details
Return Y
Order by Y
Limit Y
With Y http://wes.skeweredrook.com/th...
Reading Clauses
68% of read clauses implemented
20% are easy to implement with minimum effort based on our current code ba...
Cypher Language Coverage
Cypher clause Supported Details
ALL/ANY/NONE/SINGLE/E
XISTS
SIZE on collection
SIZE on pattern
LE...
Cypher Language Coverage
No Writing Support
Outline
1. Problem
2. Our solution
3. Underlying magic/technology
4. Competition
5. Status and timeline
6. Summary and cal...
Summary
For: customers with large RDBMs deployments
Who: would like to do some graph analytics (multi modal)
without migra...
GraHPEr: Unique Selling Points
• Storage and transfer time savings
o No data duplication
o No separate system to manage
• ...
Thank you
Luis M. Vaquero
Hewlett Packard Enterprise
Contact: luis.vaquero@hpe.com
Graph Analytics
Not just startups in Sillicon-Valley: the Lufthansas, Walmarts, the USBs, and the AT&Ts too
ScriptHop: A m...
Quick Figures
• 1% market penetration today
• Forrester Research: it will reach over 25 percent of all enterprises by 2017...
Performance, Really?
“Relational DBs have 40 years of success behind them”
http://istc-bigdata.org/index.php/benchmarking-...
Vendor Landscape
Vendor Date License Model Query Language
Complexible 2012 Commercial RDF SPARQL
DataStax 2011 Open
Source...
Graph to SQL
• Plenty of tools converting from SQL to graph languages.
• We want the opposite: Graph to SQL
Feature SQLGra...
Upcoming SlideShare
Loading in …5
×

Cypher to SQL online mapper

1,068 views

Published on

Mapping graph queries on cypher to plain old SQL on-the-fly. Automated graph topology discovery

Published in: Software
  • Be the first to comment

Cypher to SQL online mapper

  1. 1. GraHPEr: Graph queries on relational data Luis Vaquero, Marco Lotz, James Brook, Joan Varvenne, Suksant Sae Lor, David Subiros, Herry Herry, Brian Monahan March 2016
  2. 2. “Ma’ Look! Graph Analytics without Graphs!!!! June 2016
  3. 3. CC: https://www.youtube.com/watch?v=CxKOSAtMC1g
  4. 4. CC: https://www.youtube.com/watch?v=CxKOSAtMC1g
  5. 5. CC by adeevee
  6. 6. CC by Ole Rinnan. http://www.vg.no/forbruker/bil-baat-og-motor/bil-og-trafikk/post-it-feberen-brer-seg/a/165769/
  7. 7. Outline 1. Problem 2. Our solution 3. Underlying magic/technology 4. Competition 5. Status and timeline 6. Summary and call to action
  8. 8. The Case for Graph Analytics on Relational Databases • Lots of data sitting in relational databases (accumulated over the last few decades) • Some data are simply too bulky to move around • Consistency / Cascading issues slow down write throughput (key in big data apps) • Simple graph syntax and semantics to build our queries
  9. 9. Relational Data as Graphs: Problems 1. Raw SQL or stored procedures on relational DBs (“monster SQL queries”) 2. Copy data from its original source to construct a new graph (duplication) from Pixabay under CC by Chris Downer under CC from Pixabay under CC
  10. 10. Outline 1. Problem 2. Our solution 3. Underlying magic/technology 4. Competition 5. Status and timeline 6. Summary and call to action
  11. 11. Graph syntax/semantics on relational DBs without duplication
  12. 12. Outline 1. Problem 2. Our solution 3. Underlying magic/technology 4. Competition 5. Status and timeline 6. Summary and call to action
  13. 13. Relational GraHPEr Query ProcessorGraph Schema Extractor 2 Related (but Independent) main functionalities: Database Schema Set of Graph Topologies Graph Topology Cypher Query Equivalent SQL Query
  14. 14. Relational GraHPEr Query ProcessorGraph Schema Extractor 2 Related (but Independent) main functionalities: Database Schema Set of Graph Topologies Graph Topology Cypher Query Equivalent SQL Query
  15. 15. Relational Tables Id Title Released Tagline 01 Matrix 1999 Enter the Matrix Id Name Born 01 Keanu Reeves 1964 person_id Movie_id 01 02 Person_id Movie_id Role 01 01 Neo person_id Movie_id 01 03 Movie Person Directed Produced Acted In
  16. 16. The Equivalent Graph Topology (Gtop) Movie Properties:  ID  Title  Released  Tagline Person Properties:  ID  Name  Born Acted in Attributes: Role Produced Attributes: None Directed Attributes: None By default, an entity-relationship diagram Advanced ML enables finding different graphs in the data
  17. 17. Relational GraHPEr Query ProcessorGraph Schema Extractor 2 Related (but Independent) main functionalities: Database Schema Set of Graph Topologies Graph Topology Cypher Query Equivalent SQL Query
  18. 18. Query Processor Parser MATCH (m:Movie) RETURN m.title SELECT m.title FROM Movie Visitor Query Builder
  19. 19. Query Processor ParserMATCH (m:Movie) RETURN m.title SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title )
  20. 20. Query Processor Parser MATCH (m:Movie) RETURN m.title SELECT m.title FROM Movie Visitor Query Builder Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title
  21. 21. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) )
  22. 22. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) )
  23. 23. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) )
  24. 24. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) )
  25. 25. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Match - false
  26. 26. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false
  27. 27. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern
  28. 28. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern – m
  29. 29. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern – m - movie
  30. 30. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern – m - movie Return
  31. 31. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern – m - movie Return ReturnItem
  32. 32. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern – m - movie Return ReturnItem – m
  33. 33. Query Processor Visitor Query(None, SingleQuery( List( Match(false, Pattern( List( EveryPath( NodePattern( Some(Variable(m)) ,List(LabelName(movie)) ,None) ) ) ) ,List() ,None) ,Return(false, ReturnItems(false ,List( UnaliasedReturnItem( Property( Variable(m) ,PropertyKeyName(title) ) ,m.title ) )) ,None,None,None) ) ) ) Pattern Match - false NodePattern – m - movie Return ReturnItem – m - title
  34. 34. Query Processor Parser MATCH (m:Movie) RETURN m.title SELECT m.title FROM Movie Visitor Query Builder Pattern Match - false NodePattern – m - movie Return ReturnItem – m - title
  35. 35. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Return ReturnItem – m - title Template Matcher SQL Templates
  36. 36. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Return Template Matcher SQL Templates ReturnItem – m - title
  37. 37. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Return Template Matcher SQL Templates ReturnItem – m - title
  38. 38. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Return Template Matcher SQL Templates ReturnItem – m - title
  39. 39. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Return Template Matcher SQL Templates ReturnItem – m - title @* This is a template for the Return cause of Cypher language. *@ @args List returnItems @args boolean distinct @for(Map properties: returnItems) {@properties.get("property") @if(!properties_isLast){,}} HPE Confidential
  40. 40. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  41. 41. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  42. 42. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  43. 43. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  44. 44. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  45. 45. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  46. 46. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  47. 47. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher SQL Templates SELECT m.title
  48. 48. Query Processor SQL Builder Pattern Match - false NodePattern – m - movie Template Matcher Gtop: "implementationLevel" : { "implementationNodes": [ { "synonyms": ["movie"], "tableName": "Movie", "id" : [ { "columnName": "id", "dataType": "INTEGER" } ] } ] }
  49. 49. Query Processor SQL Builder Template Matcher SQL Templates FROM Movie SELECT m.title
  50. 50. Query Processor Parser MATCH (m:Movie) RETURN m.title SELECT m.title FROM Movie Visitor Query Builder
  51. 51. I/O MATCH (p: Person)-[:person_id_acted_in_person_id]->(m: Movie) RETURN p.name, m.title Input Cypher Query: Expected output SQL: SELECT p.name, m.title FROM person AS p JOIN acted_in ON (acted_in.person_id = p.id) JOIN movie AS m ON (m.id = acted_in.movie_id)
  52. 52. It doesn’t stop there! MATCH (p: Person) --> (m) return m Input Cypher Query: Expected output SQL: >>>>>>>>>
  53. 53. Hidden SQL Monsters SELECT 'movie['||m.id||']' AS m FROM person AS p JOIN directed ON (directed.person_id = p.id) JOIN movie AS m ON (m.id = directed.movie_id) UNION ALL SELECT 'movie['||m.id||']' AS m FROM person AS p JOIN acted_in ON (acted_in.person_id = p.id) JOIN movie AS m ON (m.id = acted_in.movie_id) UNION ALL SELECT 'movie['||m.id||']' AS m FROM person AS p JOIN produced ON (produced.person_id = p.id) JOIN movie AS m ON (m.id = produced.movie_id)
  54. 54. It doesn’t stop there! MATCH (keanu: Person { name: 'Keanu Reeves' }) --> (m: Movie {released: '1999'}) return m Input Cypher Query: Expected output SQL: >>>>>>>>>
  55. 55. Hidden SQL Monster SELECT 'movie['||m.id ||']' AS m FROM person AS keanu JOIN directed ON ( directed.person_id = keanu.id ) JOIN movie AS m ON ( m.id = directed.movie_id ) WHERE keanu.name = "keanu reeves" AND m.released = "1999" UNION ALL SELECT 'movie['||m.id||']' AS m FROM person AS keanu JOIN acted_in ON ( acted_in.person_id = keanu.id ) JOIN movie AS m ON ( m.id = acted_in.movie_id ) WHERE keanu.name = "keanu reeves" AND m.released = "1999" UNION ALL SELECT 'movie['||m.id||']' AS m FROM person AS keanu JOIN produced ON ( produced.person_id = keanu.id ) JOIN movie AS m ON ( m.id = produced.movie_id ) WHERE keanu.name = "keanu reeves" AND m.released = "1999"
  56. 56. Outline 1. Problem 2. Our solution 3. Underlying magic/technology 4. Competition 5. Status and timeline 6. Summary and call to action
  57. 57. Vendor Landscape Market largely dominated by Neo4J3 -> This is why we chose Cypher as our query language Me too: JSON databases are jumping into the space by enabling links between docs with properties associated (e.g. ArangoDB) Trends towards multimodal DB (OrientDB, DataStax, ) Consolidation: Experian acquired 4Store (now for internal use only) and, DataStax has acquired Aurelius (Titan graph database). 1. https://en.wikipedia.org/wiki/Oracle_Spatial_and_Graph 2. http://www.teradata.com/SQL-GR-Engine 3. http://zion-city.blogspot.co.uk/2012/05/graphdb-market-share.html * Find a more detailed comparison in the two last backup slides below
  58. 58. Vendor/Research Landscape Simple Queries No Data Duplication Teradata SQL-GR RapidGrapher Oracle Spatial&Graph IBM Graph Neo4J GraHPEr Names in bold blue indicate products Names on black font indicate research IBM’s SQLGraph GraphGen Stanford’s Ringo Spark’s GraphFrames
  59. 59. Outline 1. Problem 2. Our solution 3. Underlying magic/technology 4. Competition 5. Status and timeline 6. Summary and call to action
  60. 60. Cypher Language Coverage Cypher clause Supported Details Return Y Order by Y Limit Y With Y http://wes.skeweredrook.com/the-mythical-with-neo4js- cypher-query-language/ Skip N Can be implemented as a post-processing stage Union N Current GraHPEr syntactic parser to split query in two Unwind N No support for in-query collection/function handling yet Using N Hint neo to use “right” index General Clauses 50% of general clauses implemented 25% are easy to implement with minimum effort based on our current code base 12.5% require us to invest time in in-query collection/function processing 12.5% are neo4j specific
  61. 61. Reading Clauses 68% of read clauses implemented 20% are easy to implement with minimum effort based on our current code base 8% require us to invest time in in-query collection/function processing or build a REP 4% are for use with legacy indices in neo4jCypher clause Supported Details Match by id Y Match by type Y Match by rel patter Y Match by multiple types Y Match multiple relationships Y Match variable length relationships Y Match anonymous edges and nodes Y Match zero-path length Y Where Y Where on property Y Where on label Y Where patterns Y MATCH (n)WHERE (n)-[:KNOWS]-({ name:'Tobias' })RETURN n Where range Y Count Y Distinct Y Sum, avg, max, min Y Case Y Optional match N the Cypher equivalent of the outer join in SQL Match rels with uncommon chars N Where with string matching N Where with regexes N Percentile, std N can be implemented as a post-processing stage Where on dynamic property N Requires REPL like utility Where collection patterns N (partial) MATCH (tobias { name: 'Tobias' }),(others)WHERE others.name IN ['Andres', 'Peter'] AND (tobias)<--(others) RETURN others Start N Deprecated/legacy usage. No plans to support. Cypher Language Coverage
  62. 62. Cypher Language Coverage Cypher clause Supported Details ALL/ANY/NONE/SINGLE/E XISTS SIZE on collection SIZE on pattern LENGTH on collection LENGTH on pattern TYPE Id COALESCE HEAD/LAST Timestamp Startnode / Endnode Toint / Tofloat Nodes Relationships Labels Keys Extract (map) Filter Tail Range Reduce Math functions String functions Functions
  63. 63. Cypher Language Coverage No Writing Support
  64. 64. Outline 1. Problem 2. Our solution 3. Underlying magic/technology 4. Competition 5. Status and timeline 6. Summary and call to action
  65. 65. Summary For: customers with large RDBMs deployments Who: would like to do some graph analytics (multi modal) without migrating massive amounts of data to other platform GraHPEr Provides: read-only easy installation library discovery of graphs in relational data to query relational data in a graphy way without data duplication or cascade effects single-system administration Unlike: solutions that need data to be copied and adapted to a graph format or expose complex / verbose graph functions as stored procedures
  66. 66. GraHPEr: Unique Selling Points • Storage and transfer time savings o No data duplication o No separate system to manage • Easy to install / minimally intrusive -> multimodal DBs made easy o Just a read-only library on top of existing DB deployments • Declarative graph query language (compatibility with the market leader, Neo4J1) o Tap on large existing communities / reuse current code 1. http://neo4j.com/top-ten-reasons/
  67. 67. Thank you Luis M. Vaquero Hewlett Packard Enterprise Contact: luis.vaquero@hpe.com
  68. 68. Graph Analytics Not just startups in Sillicon-Valley: the Lufthansas, Walmarts, the USBs, and the AT&Ts too ScriptHop: A motion-picture is graph among interconnected stakeholders, including producers, directors, casting agents, cinematographers, actors, and so on. Determine scripts with characters whose particular attributes (such as minorities) make them likely to require loots of screen time, which might be excessively costly and time-consuming to produce ORiGAMI – Oak Ridge Graph Analytics for Medical Innovation http://www.forbes.com/sites/danwoods/2015/12/29/why-graph-technology-is-ready-for-its-close-up-in-2016
  69. 69. Quick Figures • 1% market penetration today • Forrester Research: it will reach over 25 percent of all enterprises by 2017 • Popular tools: o GraphConnect (Neo4J, SF’15):  more than 1000 developers  more than 350 organisations o 1000000+ downloads o 124 contributors o 36500 commits
  70. 70. Performance, Really? “Relational DBs have 40 years of success behind them” http://istc-bigdata.org/index.php/benchmarking-graph-databases/ HPE Confidential
  71. 71. Vendor Landscape Vendor Date License Model Query Language Complexible 2012 Commercial RDF SPARQL DataStax 2011 Open Source Property Gremlin FlockDB 2010 Open Source Property Java Franz (AllegroDB) 2005 Dual RDF SPARQL, RDFS++, OWL2-RL, Prolog Neo4J 2007 Open Source Property Cypher, native API, TinkerPop Objectivity 2011 Commercial Objects Java Oracle 2015 Commercial Property Java, Gremlin, Groovy, Python Orient Tech 2011 Open Source Property REST, Gremlin, SPARQL, SQL Informatica 2015 Commercial RDF SPARQL Ontotext/GraphDB 2000 Commercial RDF SPARQL Teradata SQL-GR 2015 Commercial Relational SQL IBM Graph 2015 Dual Property Gremlin Actian 2014 Commercial RDF SPARQL MarkLogic 2015 Commercial RDF SPARQL ArangoDB Commercial Property AQL, Blueprints
  72. 72. Graph to SQL • Plenty of tools converting from SQL to graph languages. • We want the opposite: Graph to SQL Feature SQLGraph (IBM) GraphiQL (MIT) GraHPEr (HPE) Language non-side-effecting Gremlin to SQL compilation Pig-Latin inspired new declarative language compiled into SQL OpenCypher (with time extensions) compiled to SQL SQL Exploits recursive/iterative queries Exploits recursive/iterative queries ANSI92 with Vertica-friendly optimisations Additional tables Created relational tables (to represent edges and nodes) Separate GraphTables Maximise reuse of existing tables Integration with pre- existing installations Requires migration Requires migration No migration Type of analysis Bulk Bulk Time-based Benchmarks Large-scale Mid-scale (SNAP data) TBD (goal is large-scale, but time constraints are key)

×