Successfully reported this slideshow.
Your SlideShare is downloading. ×

Extending Spark Graph for the Enterprise with Morpheus and Neo4j

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 54 Ad

Extending Spark Graph for the Enterprise with Morpheus and Neo4j

Download to read offline

Spark 3.0 introduces a new module: Spark Graph. Spark Graph adds the popular query language Cypher, its accompanying Property Graph Model and graph algorithms to the data science toolbox. Graphs have a plethora of useful applications in recommendation, fraud detection and research.

Morpheus is an open-source library that is API compatible with Spark Graph and extends its functionality by:

A Property Graph catalog to manage multiple Property Graphs and Views
Property Graph Data Sources that connect Spark Graph to Neo4j and SQL databases
Extended Cypher capabilities including multiple graph support and graph construction
Built-in support for the Neo4j Graph Algorithms library In this talk, we will walk you through the new Spark Graph module and demonstrate how we extend it with Morpheus to support enterprise users to integrate Spark Graph in their existing Spark and Neo4j installations.
We will demonstrate how to explore data in Spark, use Morpheus to transform data into a Property Graph, and then build a Graph Solution in Neo4j.

Spark 3.0 introduces a new module: Spark Graph. Spark Graph adds the popular query language Cypher, its accompanying Property Graph Model and graph algorithms to the data science toolbox. Graphs have a plethora of useful applications in recommendation, fraud detection and research.

Morpheus is an open-source library that is API compatible with Spark Graph and extends its functionality by:

A Property Graph catalog to manage multiple Property Graphs and Views
Property Graph Data Sources that connect Spark Graph to Neo4j and SQL databases
Extended Cypher capabilities including multiple graph support and graph construction
Built-in support for the Neo4j Graph Algorithms library In this talk, we will walk you through the new Spark Graph module and demonstrate how we extend it with Morpheus to support enterprise users to integrate Spark Graph in their existing Spark and Neo4j installations.
We will demonstrate how to explore data in Spark, use Morpheus to transform data into a Property Graph, and then build a Graph Solution in Neo4j.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Extending Spark Graph for the Enterprise with Morpheus and Neo4j (20)

Advertisement

More from Databricks (20)

Advertisement

Extending Spark Graph for the Enterprise with Morpheus and Neo4j

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Martin Junghanns & Sören Reichardt Neo4j Extending Spark Graph for the Enterprise with Morpheus and Neo4j #UnifiedDataAnalytics #SparkAISummit
  3. 3. #UnifiedDataAnalytics #SparkAISummit Motivation
  4. 4. #UnifiedDataAnalytics #SparkAISummit Graphs are everywhere 4
  5. 5. #UnifiedDataAnalytics #SparkAISummit … and growing 5
  6. 6. #UnifiedDataAnalytics #SparkAISummit The Property Graph Model Node ● Represents an entity within the graph ● Can have labels Relationship ● Connects a start node with an end node ● Has one type Property ● Describes a node/relationship: e.g. name, age, weight etc ● Key-value pair: String key; typed value (string, number, list, ...) 6
  7. 7. #UnifiedDataAnalytics #SparkAISummit Graph Patterns with Cypher
  8. 8. #UnifiedDataAnalytics #SparkAISummit Graphs are coming to Spark 8
  9. 9. #UnifiedDataAnalytics #SparkAISummit Spark Project Improvement Proposal ● Defines a Cypher-compatible Property Graph type based on DataFrames ● Replaces GraphFrames querying with Cypher ● Reimplements GraphFrames/GraphX algos on the Property Graph type ● Running PoC: [SPARK-27299][GRAPH][WIP] Spark Graph API design proposal https://git.io/fjqp6
  10. 10. #UnifiedDataAnalytics #SparkAISummit SPIP: What are we trying to do? ● “Spark Cypher” ○ Run a Cypher query on a Property Graph returning a tabular result ● Implementation is based on Spark SQL ○ Property Graphs are composed of one or more DFs ● Provide Scala, Python and Java APIs ● Deep dive: Graph Features in Spark 3.0: Thursday 11AM, Room G104
  11. 11. #UnifiedDataAnalytics #SparkAISummit SPIP: How does it look like? 11 spark-graph-api spark-cypher spark-sql SPIP
  12. 12. #UnifiedDataAnalytics #SparkAISummit Spark Graph Demo 12
  13. 13. #UnifiedDataAnalytics #SparkAISummit SPIP: What are we not solving? ● Addresses the Cypher Property Graph Model ○ Does not deal with variants of that model (e.g. RDF) ● No multiple graph features ○ API is flexible to support this in future iterations ● No Property Graph Catalog ○ Also no Property Graph specific Data Sources
  14. 14. #UnifiedDataAnalytics #SparkAISummit ... but ... 14
  15. 15. #UnifiedDataAnalytics #SparkAISummit Morpheus: Spark Graph for the enterprise
  16. 16. #UnifiedDataAnalytics #SparkAISummit The OLTP / OLAP landscape Tables Graphs Transactional PostgreSQL, Oracle, SQLServer Neo4j Data Integration & Analytics Spark SQL Morpheus
  17. 17. #UnifiedDataAnalytics #SparkAISummit Morpheus creates Property Graphs ... PROPERTY GRAPH composing DataFrames Hive, DF, JDBC TABLES SUB- GRAPH FS snapshot Morpheus SOURCES
  18. 18. #UnifiedDataAnalytics #SparkAISummit … wrangles Property Graphs ... DataFrame Table Result Cypher QUERY Property Graph Result Property Graph Cypher QUERY Cypher QUERY Property Graph Result DataFrame Driving Table SPIP
  19. 19. #UnifiedDataAnalytics #SparkAISummit … analyses graphs in Spark and Neo4j ... GRAPH ALGOS ANALYSIS toolsets DataFrame DataFrame Property Graph Property Graph
  20. 20. #UnifiedDataAnalytics #SparkAISummit … and stores Property Graphs Morpheus STORE SUBGRAPH FS snapshot Property Graph
  21. 21. #UnifiedDataAnalytics #SparkAISummit Spark and Neo4j Spark is an immutable data processing engine ○ Spark SQL organizes data in tables (DataFrames) ○ DataFrames can be queried via SQL ○ Spark SQL programs are optimized by Catalyst Neo4j is a native transactional CRUD database ○ Neo4j graphs use a native graph data representation ○ Neo4j graphs can be queried using Cypher ○ Neo4j has optimized in-process MT graph algos
  22. 22. #UnifiedDataAnalytics #SparkAISummit Morpheus: SQL + Cypher in one session Graphs and tables are both useful data models ○ Finding paths and subgraphs, and transforming graphs ○ Viewing, aggregating and ordering values The Morpheus project parallels Spark SQL ○ PropertyGraph type (composed of DataFrames) ○ Catalog of graph data sources, named graphs, views, ○ Cypher query language A CypherSession adds graphs to a SparkSession
  23. 23. #UnifiedDataAnalytics #SparkAISummit What is Morpheus used for? Data integration ○ Integrate (non-)graphy data from multiple, heterogeneous data sources into one or more property graphs Distributed Cypher execution ○ OLAP-style graph analytics Data science ○ Integration with other Spark libraries ○ Feature extraction using Neo4j Graph Algorithms
  24. 24. #UnifiedDataAnalytics #SparkAISummit Neo4j Graph Algorithms https://bit.ly/2oUfnA5 • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors* Available in GraphFrames
  25. 25. #UnifiedDataAnalytics #SparkAISummit 25 Free O’Reilly Book neo4j.com/ graph-algorithms-book • Spark & Neo4j Examples • Machine Learning Chapter
  26. 26. #UnifiedDataAnalytics #SparkAISummit Cypher An open language for graph querying
  27. 27. #UnifiedDataAnalytics #SparkAISummit Cypher query language Cypher 9 is the latest full version of openCypher ○ Implemented in Neo4j 3.5 ○ Implemented in whole/part by six other vendors ○ Several other partial and research implementations ○ Cypher for Gremlin is another openCypher project
  28. 28. #UnifiedDataAnalytics #SparkAISummit Cypher 9 in Morpheus and Spark Graph (SPIP) Cypher is a full CRUD language ○ RETURNs only tabular results: not composable ○ Results can include graph elements (paths, relationships, nodes) or property values Morpheus and SPIP implement most of read-only Cypher ○ No MERGE or DELETE ○ Spark immutable data + transformations
  29. 29. #UnifiedDataAnalytics #SparkAISummit Cypher 10 in Morpheus - Multiple graphs Cypher 10 proposes support for Multiple Graphs ○ Multiple Graph CIP: https://git.io/fjmrx Allows for Cypher Query composition ○ Similar to chaining transformations on DataFrames Support Graph Catalog for managing Graphs ○ Analogous to Spark SQL catalog Query support for Graph Construction
  30. 30. #UnifiedDataAnalytics #SparkAISummit Returning tabular data Input: a property graph Output: a table FROM GRAPH socialNetwork MATCH ({name: 'Dan'})-[:FRIEND*2]->(foaf) RETURN toUpper(foaf.name) AS name ORDER BY name DESC Language features available in Morpheus
  31. 31. #UnifiedDataAnalytics #SparkAISummit Constructing graphs Input: a property graph Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person)-[:FRIEND*2]->(foaf) WHERE NOT (p)-[:FRIEND]->(foaf) CONSTRUCT CREATE (p)-[:POSSIBLE_FRIEND]->(foaf) RETURN GRAPH Language features available in Morpheus
  32. 32. #UnifiedDataAnalytics #SparkAISummit Querying multiple graphs Input: property graphs Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person) FROM GRAPH products MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON socialNetwork, products CREATE (p)-[:IS]->(c) RETURN GRAPH Language features available in Morpheus
  33. 33. #UnifiedDataAnalytics #SparkAISummit Creating graph views Input: property graphs Output: a property graph CATALOG CREATE VIEW youngFriends($inGraph){ FROM GRAPH $inGraph MATCH (p1:Person)-[r]->(p2:Person) WHERE p1.age < 25 AND p2.age < 25 CONSTRUCT CREATE (p1)-[COPY OF r]->(p2) RETURN GRAPH } Language features available in Morpheus
  34. 34. #UnifiedDataAnalytics #SparkAISummit Using graph views Input: property graphs Output: table or graph FROM youngFriends(socialNetwork) MATCH (p:Person)-[r]->(o) RETURN p, r, o // and views over views FROM youngFriends(europe(socialNetwork)) MATCH ... Language features available in Morpheus
  35. 35. #UnifiedDataAnalytics #SparkAISummit Managing multiple graphs
  36. 36. #UnifiedDataAnalytics #SparkAISummit Managing multiple graphs Property Graphs are managed within a catalog Cypher Session Property Graph Catalog Property Graph Data Source <namespace> Property Graph <name> QualifiedGraphName = <namespace>.<name>
  37. 37. #UnifiedDataAnalytics #SparkAISummit PGDS implementations in Morpheus PGDS Multiple graphs Read graphs Write graphs File-based Parquet, ORC, CSV HDFS, local, S3 Yes Yes Yes SQL Hive, Jdbc Yes Yes No Neo4j Bolt Yes Yes Yes Neo4j Bulk Import No No Yes
  38. 38. #UnifiedDataAnalytics #SparkAISummit Catalog operations via Cypher Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” (Property Graph)
  39. 39. #UnifiedDataAnalytics #SparkAISummit Read from single Property Graph Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” (Property Graph) FROM social-net.US MATCH (p:Person) RETURN p
  40. 40. #UnifiedDataAnalytics #SparkAISummit Read from multiple Property Graphs Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email RETURN p, c
  41. 41. #UnifiedDataAnalytics #SparkAISummit Construct new Property Graphs Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” CATALOG CREATE GRAPH social-net.US_new { FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON social-net.US CREATE (p)-[:SAME_AS]->(c) RETURN GRAPH }
  42. 42. #UnifiedDataAnalytics #SparkAISummit Construct new Property Graphs CATALOG CREATE GRAPH social-net.US_new { FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON social-net.US CREATE (p)-[:SAME_AS]->(c) RETURN GRAPH } Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” “US_new”
  43. 43. #UnifiedDataAnalytics #SparkAISummit Create and query Graph Views Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” ... CATALOG CREATE VIEW youngPeople($sn) { FROM $sn MATCH (p:Person)-[r]->(n) WHERE p.age < 21 CONSTRUCT CREATE (p)-[COPY OF r]->(n) RETURN GRAPH } FROM youngPeople(social-net.US) MATCH (p:Person) RETURN p “youngPeople” Views
  44. 44. #UnifiedDataAnalytics #SparkAISummit Morpheus integrates with Neo4j Graph Algorithms
  45. 45. #UnifiedDataAnalytics #SparkAISummit The Yelp Open Dataset 45 :Business name : ACME address : 123 ACME Rd. city : San Jose state : CA :User name : Alice since : 2013 elite : [2014, 2016] :User name : Bob since : 2014 elite : null :REVIEWS stars : 5 date : 2014-02-03 :REVIEWS stars : 4 date : 2014-08-03 https://www.yelp.com https://www.yelp.com/dataset https://www.yelp.com/dataset/challenge
  46. 46. #UnifiedDataAnalytics #SparkAISummit Yelp Demo Overview 46 Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  47. 47. #UnifiedDataAnalytics #SparkAISummit Starting point: A Library of Graphs 47 2015 - 2018 (:User)-[:CO_REVIEWS]->(:User) (:User)-[:REVIEWS]->(:Business) (:User)-[:CO_REVIEWS]->(:User) Constuct graphs for each year (:Business)-[:CO_REVIEWED]->(:Business) https://git.io/fjZ25
  48. 48. #UnifiedDataAnalytics #SparkAISummit Computing “TrendRank” for Yelp businesses 48 2017 to 2018 call algo.pagerank 2017 2018 trendRank = pageRank_2018 - pageRank_2017 ⋈ (:Business) -[:CO_REVIEWED]-> (:Business) https://git.io/fjZ2j
  49. 49. #UnifiedDataAnalytics #SparkAISummit Neo4j Integration Demo 49
  50. 50. #UnifiedDataAnalytics #SparkAISummit Morpheus is binary compatible with Spark Graph
  51. 51. #UnifiedDataAnalytics #SparkAISummit Morpheus and Spark Graph: API compatibility spark-graph-api spark-cypher spark-sql okapi* morpheus spark-sql openCypherSPIP Cypher to relational operators compiler openCypher * Graph Features in Spark 3.0: Thursday 11AM, Room G104
  52. 52. #UnifiedDataAnalytics #SparkAISummit Binary Switch Demo 52
  53. 53. #UnifiedDataAnalytics #SparkAISummit Outlook next talk Graph Features in Spark 3.0: Thursday 11AM, Room G104 https://theoatmeal.com/comics/sneak_peek
  54. 54. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×