Successfully reported this slideshow.
Your SlideShare is downloading. ×

Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark with Alastair Green and Mats Rydberg

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 41 Ad

Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark with Alastair Green and Mats Rydberg

Download to read offline

Fuse graph, document and relational data from transactional and analytic data sources, into a property graph “bird’s eye view”. The property graph data model is Chen’s “entity relationship” model, without clutter. Use “ASCII Art” visual property graph schemas to define “graph data lifts”, mapping from data lake, RDBMS, RDF or graph data cloud services into Spark. Graphs in Spark draw on multiple data sources. Leverage the Cypher query language to combine, split, and project graphs in Spark memory. Graph data is “woven” in Spark without altering or copying the original source. The results of graph workloads can be written back into HDFS or other file systems. Graphs can be read from, stored and merged into a Neo4j transactional database. And tabular datasets can be extracted from graphs. Data scientists and engineers load, wrangle and analyze mixed model data through Morpheus transformations. Enterprises use graphs to catalogue their disparate data assets and processes. They store graph datasets in the data lake. In a world of concern about data protection, see how graph data lifts allow tailored, canonical data views to be realized, in Spark, without remodeling and moving data. Morpheus combines SparkSQL and Cypher queries, and table/graph functions.Choose the right language for the job: eliminate cumbersome multi-joins for connected-data traversals by using super-concise Cypher patterns for sub-graph detection and graph projection; use the power of table projection, grouping, aggregation in SparkSQL, all in one application. Feel free to “dismantle your graph”: expose your graph nodes or relationships as dataframes, or as Hive tables. Key Takeaways Graph technology meets Big Data and Spark Analytics Property graphs: the superset data model Graph, relational and document data, interwoven Lift, split, combine, and create new graphs, from any data source Get your data fit to exploit graph compute, without losing any of your existing tools undefined undefined undefined undefined undefined

Fuse graph, document and relational data from transactional and analytic data sources, into a property graph “bird’s eye view”. The property graph data model is Chen’s “entity relationship” model, without clutter. Use “ASCII Art” visual property graph schemas to define “graph data lifts”, mapping from data lake, RDBMS, RDF or graph data cloud services into Spark. Graphs in Spark draw on multiple data sources. Leverage the Cypher query language to combine, split, and project graphs in Spark memory. Graph data is “woven” in Spark without altering or copying the original source. The results of graph workloads can be written back into HDFS or other file systems. Graphs can be read from, stored and merged into a Neo4j transactional database. And tabular datasets can be extracted from graphs. Data scientists and engineers load, wrangle and analyze mixed model data through Morpheus transformations. Enterprises use graphs to catalogue their disparate data assets and processes. They store graph datasets in the data lake. In a world of concern about data protection, see how graph data lifts allow tailored, canonical data views to be realized, in Spark, without remodeling and moving data. Morpheus combines SparkSQL and Cypher queries, and table/graph functions.Choose the right language for the job: eliminate cumbersome multi-joins for connected-data traversals by using super-concise Cypher patterns for sub-graph detection and graph projection; use the power of table projection, grouping, aggregation in SparkSQL, all in one application. Feel free to “dismantle your graph”: expose your graph nodes or relationships as dataframes, or as Hive tables. Key Takeaways Graph technology meets Big Data and Spark Analytics Property graphs: the superset data model Graph, relational and document data, interwoven Lift, split, combine, and create new graphs, from any data source Get your data fit to exploit graph compute, without losing any of your existing tools undefined undefined undefined undefined undefined

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark with Alastair Green and Mats Rydberg (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark with Alastair Green and Mats Rydberg

  1. 1. #SAISDD9 Neo4j Morpheus: Interweaving Documents, Tables and Graph Data in Spark Alastair Green, Mats Rydberg Neo4j #SAISDD9 1
  2. 2. #SAISDD9 Introduction Mats Rydberg Engineering Lead for Cypher for Apache Spark and Neo4j Morpheus, Cypher Language Group Alastair Green Lead, Neo4j Query Languages team, PM for Cypher for Apache Spark/Neo4j Morpheus and Cypher for Gremlin Neo4j Morpheus A product in gestation, based on Cypher for Apache Spark Enriching Spark’s graph capability Combining Spark SQL with Spark graph querying Interweaving graph, table and nested/document data Integrating Spark analytics and Neo4j operational data Advancing graph query language (GQL) features
  3. 3. #SAISDD9 Property Graphs meet Big Data The Property Graph data model is becoming increasingly mainstream Cloud graph data services like Azure CosmosDB or Amazon Neptune Simple graph features in SQLServer 2017, multiple new graph DB products Read-only graph queries coming in the SQL:2020 standard Neo4j becoming common operational store in retail, finance, telcos … and more Increasing interest in graph algorithms over graph data as a basis for AI Apache® Spark is the leading scale-out clustered memory solution for Big Data Spark 2: Data viewed as tables (DataFrames), processed by SQL, in function chains, using queries and user functions, transforming immutable tabular data sets
  4. 4. #SAISDD9 Property Graphs Nodes (Entities) Node ● Represents an entity within the graph ● Can have labels 4
  5. 5. #SAISDD9 Property Graphs Relationships 5 Node ● Represents an entity within the graph ● Can have labels Relationship ● Connects a start node with an end node ● Has one type
  6. 6. #SAISDD9 Property Graphs Properties 6 Node ● Represents an entity within the graph ● Can have labels Relationship ● Connects a start node with an end node ● Has one type Property ● Describes a node/relationship: e.g. name, age, weight etc ● Key-value pair: String key; typed value (string, number, bool, list, ...)
  7. 7. #SAISDD9 Graph patterns 7
  8. 8. #SAISDD9 Searching For Graph Patterns 8
  9. 9. #SAISDD9 Operational Tables and Graphs → Analytics
  10. 10. #SAISDD9 Spark: SQL and Cypher Apache® Spark is the leading scale-out clustered memory solution for Big Data Spark2: Data viewed as tables (Dataframes) With StructType schema to describe the type Processed by SQL and custom functions in function chains transforming input immutable Dataframes into output tables Cypher for Apache Spark (CAPS) mirrors the capabilities of Spark SQL Data viewed as graphs (made up of Dataframes) With PropertyGraph.schema to describe the graph type Processed by Cypher and user functions in function chains , transforming input immutable PropertyGraphs into output graphs
  11. 11. #SAISDD9 Cypher for Apache Spark Neo4j Graph DB + Cypher for Apache Spark CypherSession PropertyGraph PropertyGraphCatalog Cypher Queries for CRUD Graph Algorithms Neo4j Neo4j Database Cypher Queries for CR[no UD] Property Graph Data Sources as Plugins Graphs of Dataframes Spark Cypher Engine
  12. 12. #SAISDD9 Cypher for Apache Spark Cypher Property Graph Data Sources CypherSession PropertyGraph PropertyGraphCatalog Cypher Property Graph Data Sources Cypher Queries for CRUD Graph Algorithms Neo4j Neo4j Database Spark Cypher Engine Cypher Queries for CR[no UD] Property Graph Data Sources as Plugins Graphs of Dataframes Neo4j Morpheus
  13. 13. #SAISDD9 Cypher for Apache Spark SQL Property Graph Data Source CypherSession PropertyGraph PropertyGraphCatalog Cypher Property Graph Data Sources Cypher Queries for CRUD Graph Algorithms Neo4j Neo4j Database Spark Cypher Engine Cypher Queries for CR[no UD] Property Graph Data Sources as Plugins Graphs of Dataframes Neo4j Morpheus SQL Property Graph Data Sources Hive, Oracle, MySQL, SQLServer Cypher Schema SQL Graph DDL
  14. 14. #SAISDD9 Cypher for Apache Spark FS Property Graph Data Source CypherSession PropertyGraph PropertyGraphCatalog Cypher Property Graph Data Sources Cypher Queries for CRUD Graph Algorithms Neo4j Neo4j Database Filesystem Property Graph Data Sources LFS, HDFS, S3, GFS, DBFS Text, ORC, Parquet Spark Cypher Engine Cypher Queries for CR[no UD] Property Graph Data Sources as Plugins Graphs of Dataframes Neo4j Morpheus SQL Property Graph Data Sources Hive, Oracle, MySQL, SQLServer Cypher Schema SQL Graph DDL
  15. 15. #SAISDD9 Cypher for Apache Spark Future Property Graph Data Sources CypherSession PropertyGraph PropertyGraphCatalog Cypher Property Graph Data Sources Cypher Queries for CRUD Graph Algorithms Neo4j Gremlin Databases Neo4j Database Filesystem Property Graph Data Sources LFS, HDFS, S3, GFS, DBFS Text, ORC, Parquet Spark Cypher Engine Cypher Queries for CR[no UD] Property Graph Data Sources as Plugins Graphs of Dataframes Neo4j Morpheus SQL Property Graph Data Sources Hive, Oracle, MySQL, SQLServer Cypher Schema SQL Graph DDL
  16. 16. #SAISDD9 Graphs and Views in the Catalog // named graph session.people // view function session.peopleByCountry()
  17. 17. #SAISDD9 Querying (multiple) named graphs // Session implicit val session: CAPSSession = CAPSSession.local() ... // Query val result = session.cypher( """|FROM GRAPH socialNetwork |MATCH (p:Person) |FROM GRAPH products |MATCH (c:Customer)-[:ORDERED]->(i:Item) WHERE p.name = c.name |RETURN p.name, c.id, count(i.price) AS amount """.stripMargin) result.show()
  18. 18. #SAISDD9 Cypher DB ODS SQL DB ODSFILE SYSTEM SPARK Analytics Platform HIVE Turning Tables into Graphs Graph DDL for schema and table mappings Cypher Graph Queries Bolt JDBCSparkSQL/Hive
  19. 19. #SAISDD9 “Tables for Labels” In Cypher for Apache Spark graphs have a schema (graph type) The schema defines The properties that belong to a label The node types and relationship types that occur in the graph Node type is a label set (one or more labels → node type) Relationship triplets that permit node and relationship type combinations A graph instance is made up of tables, one per node type/relationship type
  20. 20. #SAISDD9 Mapping SQL tables into a Property Graph
  21. 21. #SAISDD9 Cypher DB ODS SQL DB ODS SPARK Analytics Platform Delta views Synchronizing SQL data source with Neo4j SQL PGDS map delta graph into Spark Neo4j Graph Merge delta graph Bolt FILE SYSTEM HIVE JDBC SparkSQL/Hive
  22. 22. #SAISDD9 Demo: Customer360 incremental merge 22 Seed 2017-01-01 Delta 2017-01-02 Delta 2017-01-03 Delta ... SQL (Hive) Neo4j Operational Store SQL PGDS + Neo4j Merge
  23. 23. #SAISDD9 Customer360 graph data model 23
  24. 24. #SAISDD9 Query composition
  25. 25. #SAISDD9 Query composition: graphs in, graphs out val graph = session.cypher(" // Let's query two graphs... FROM GRAPH socialNetwork MATCH (p:Person) FROM GRAPH products MATCH (c:Customer) WHERE p.email = c.email // ...and construct a new graph from the result CONSTRUCT ON socialNetwork, products CREATE (p)-[:IS]->(c) RETURN GRAPH ").graph
  26. 26. #SAISDD9 Graph construction
  27. 27. #SAISDD9 Query composition
  28. 28. #SAISDD9 Cypher DDL to define schema objects CATALOG CREATE GRAPH session.peopleUS { FROM session.people MATCH (us:Person)-[R*]-(n) WHERE us.nationality = ‘USA’ RETURN GRAPH OF * } CATALOG CREATE VIEW session.peopleByCountry($countryCode, $peopleGraph) { FROM $peopleGraph MATCH (us:Person)-[:R*]-(n) WHERE us.nationality = $countryCode RETURN GRAPH OF * }
  29. 29. #SAISDD9 // named graph FROM session.peopleUS MATCH (people:Person) RETURN people.lastName // view over graph FROM session.peopleByCountry(‘USA’, teradata.europe.people) MATCH (people:Person) RETURN people.lastName Using named graphs and view functions
  30. 30. #SAISDD9 Query composition: get data from new graph // Let's query the graph we just constructed val result = graph.cypher(" MATCH (person:Person)-[:FRIEND_OF]-(friend:Person), (friend)-[:IS]->(customer:Customer), (customer)-[:BOUGHT]->(product:Product) RETURN DISTINCT product.title AS recommendation, person.name ORDER BY recommendation ") result.show()
  31. 31. #SAISDD9 SQL and Cypher working together // Query with Cypher val results = socialNetwork.cypher(" MATCH (a:Person)-[r:FRIEND_OF]->(b) RETURN a.name AS friend1, b.name AS friend2, r.since AS since ") // Extract DataFrame representing the query result and register results.records.asDataFrame.createOrReplaceTempView("friends") // Query with SQL spark.sql(" SELECT friend1, friend2, since FROM friends ORDER BY since ").show()
  32. 32. #SAISDD9 Demo: Database snapshots for analytics
  33. 33. #SAISDD9 Cypher Graph Schema Labels and Properties CREATE GRAPH snb WITH GRAPH SCHEMA ( LABEL (Company), LABEL (Message { creationDate : TIMESTAMP?, locationIP : STRING?, content : STRING?, length : INTEGER? }), LABEL (LIKES { creationDate : TIMESTAMP? }), ...
  34. 34. #SAISDD9 Cypher Graph Schema Node/Edge Types // allowed node label sets (node types) (Message, Post), (Company), (Country), (Person), ... // allowed relationship (edge) types [LIKES], [KNOWS], [IS_LOCATED_IN], ...
  35. 35. #SAISDD9 Cypher Graph Schema Relationship Triplets // relationship triplets (Message) <0 .. *> -[IS_LOCATED_IN]-> <1> (Country), (Person) <1> -[LIKES]-> <0 .. *> (Message), (Company) <0 .. *> -[IS_LOCATED_IN]-> <1> (Country), (Person) <0 .. *> -[KNOWS]-> <0 .. *> (Person), // end of pure graph schema
  36. 36. #SAISDD9 SQL Graph DDL Mapping “Tables to Labels” RELATIONSHIP LABEL SETS ( (HAS_CREATOR) FROM "postHasCreator" edge START NODES LABEL SET (Message, Post) FROM "Post" start_nodes JOIN ON start_nodes.ID = edge."post" END NODES LABEL SET (Person) FROM "Person" end_nodes JOIN ON end_nodes.ID = edge."creator" ... )) Relationship Label Set → Relationship Type SQL Relationship Source Table Node Label Set → Node Type SQL Node Source Table
  37. 37. #SAISDD9 SQL PGQ and GQL Morpheus SQL Property Graph Data Source SQL Graph DDL A prototype of SQL Property Graph Query (SQL/PGQ) Will be part of ISO SQL:202x Morpheus SQL PGDS Graph Schema and Composable Graph Queries “Prequel to GQL” GQL is an initiative to create an ISO international GQL standard alongside SQL SQL PGQ planned as a (major) subset of a wider “native” full CRUD GQL
  38. 38. #SAISDD9 From Cypher GQL
  39. 39. #SAISDD9 Spark SQL and Cypher → Spark SQL and GQL Cypher for Apache Spark developed by Neo4j ● Source code under Apache 2.0 license ● github.com/openCypher/cypher-for-apache-spark Considering SPIP to bring enhanced graph support to Spark ● Ongoing discussion with Databricks Apache Spark/GraphFrames developers ● Concept: DataFrame-based graphs and Cypher graph queries Directions ● Cypher → ISO GQL ● Cypher for Spark → Spark GQL + Spark SQL spark.sql spark.gql
  40. 40. #SAISDD9 Neo4j Morpheus planned timeline Neo4j Morpheus a new product, complementing the Neo4j transactional DB ● Commercially supported Cypher for Apache Spark ● Certified for Spark distributions and for SQL data sources Limited access Early Adopter release scheduled for end October Sign up to join EA programme https://neo4j.com/morpheus/ Documentation: https://neo4j.com/docs/morpheus-user-guide/preview/ To come ● Cypher language feature expansion ● Performance and scale testing in Q4/Q1 H1 2019 1.0 release as an extension to Neo4j Enterprise
  41. 41. #SAISDD9 Questions?

×