SlideShare a Scribd company logo
1 of 54
Download to read offline
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Martin Junghanns & Sören Reichardt
Neo4j
Extending Spark Graph for the
Enterprise with Morpheus and
Neo4j
#UnifiedDataAnalytics #SparkAISummit
#UnifiedDataAnalytics #SparkAISummit
Motivation
#UnifiedDataAnalytics #SparkAISummit
Graphs are everywhere
4
#UnifiedDataAnalytics #SparkAISummit
… and growing
5
#UnifiedDataAnalytics #SparkAISummit
The Property Graph Model
Node
● Represents an entity within the graph
● Can have labels
Relationship
● Connects a start node with an end node
● Has one type
Property
● Describes a node/relationship: e.g. name, age, weight etc
● Key-value pair: String key; typed value (string, number, list, ...)
6
#UnifiedDataAnalytics #SparkAISummit
Graph Patterns with Cypher
#UnifiedDataAnalytics #SparkAISummit
Graphs are coming to Spark
8
#UnifiedDataAnalytics #SparkAISummit
Spark Project Improvement Proposal
● Defines a Cypher-compatible Property Graph type
based on DataFrames
● Replaces GraphFrames querying with Cypher
● Reimplements GraphFrames/GraphX algos on the
Property Graph type
● Running PoC: [SPARK-27299][GRAPH][WIP] Spark
Graph API design proposal
https://git.io/fjqp6
#UnifiedDataAnalytics #SparkAISummit
SPIP: What are we trying to do?
● “Spark Cypher”
○ Run a Cypher query on a Property Graph returning a
tabular result
● Implementation is based on Spark SQL
○ Property Graphs are composed of one or more DFs
● Provide Scala, Python and Java APIs
● Deep dive: Graph Features in Spark 3.0: Thursday 11AM,
Room G104
#UnifiedDataAnalytics #SparkAISummit
SPIP: How does it look like?
11
spark-graph-api
spark-cypher
spark-sql
SPIP
#UnifiedDataAnalytics #SparkAISummit
Spark Graph Demo
12
#UnifiedDataAnalytics #SparkAISummit
SPIP: What are we not solving?
● Addresses the Cypher Property Graph Model
○ Does not deal with variants of that model (e.g. RDF)
● No multiple graph features
○ API is flexible to support this in future iterations
● No Property Graph Catalog
○ Also no Property Graph specific Data Sources
#UnifiedDataAnalytics #SparkAISummit
... but ...
14
#UnifiedDataAnalytics #SparkAISummit
Morpheus:
Spark Graph for the enterprise
#UnifiedDataAnalytics #SparkAISummit
The OLTP / OLAP landscape
Tables Graphs
Transactional
PostgreSQL,
Oracle,
SQLServer
Neo4j
Data
Integration
& Analytics Spark SQL Morpheus
#UnifiedDataAnalytics #SparkAISummit
Morpheus creates Property Graphs ...
PROPERTY
GRAPH
composing
DataFrames
Hive, DF, JDBC
TABLES
SUB-
GRAPH
FS snapshot
Morpheus
SOURCES
#UnifiedDataAnalytics #SparkAISummit
… wrangles Property Graphs ...
DataFrame
Table Result
Cypher
QUERY
Property
Graph Result
Property
Graph Cypher
QUERY
Cypher
QUERY
Property
Graph Result
DataFrame
Driving Table
SPIP
#UnifiedDataAnalytics #SparkAISummit
… analyses graphs in Spark and Neo4j ...
GRAPH
ALGOS
ANALYSIS
toolsets
DataFrame DataFrame
Property
Graph
Property
Graph
#UnifiedDataAnalytics #SparkAISummit
… and stores Property Graphs
Morpheus
STORE
SUBGRAPH
FS snapshot
Property
Graph
#UnifiedDataAnalytics #SparkAISummit
Spark and Neo4j
Spark is an immutable data processing engine
○ Spark SQL organizes data in tables (DataFrames)
○ DataFrames can be queried via SQL
○ Spark SQL programs are optimized by Catalyst
Neo4j is a native transactional CRUD database
○ Neo4j graphs use a native graph data representation
○ Neo4j graphs can be queried using Cypher
○ Neo4j has optimized in-process MT graph algos
#UnifiedDataAnalytics #SparkAISummit
Morpheus: SQL + Cypher in one session
Graphs and tables are both useful data models
○ Finding paths and subgraphs, and transforming graphs
○ Viewing, aggregating and ordering values
The Morpheus project parallels Spark SQL
○ PropertyGraph type (composed of DataFrames)
○ Catalog of graph data sources, named graphs, views,
○ Cypher query language
A CypherSession adds graphs to a SparkSession
#UnifiedDataAnalytics #SparkAISummit
What is Morpheus used for?
Data integration
○ Integrate (non-)graphy data from multiple, heterogeneous
data sources into one or more property graphs
Distributed Cypher execution
○ OLAP-style graph analytics
Data science
○ Integration with other Spark libraries
○ Feature extraction using Neo4j Graph Algorithms
#UnifiedDataAnalytics #SparkAISummit
Neo4j Graph Algorithms
https://bit.ly/2oUfnA5
• Parallel Breadth First Search
• Parallel Depth First Search
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors* Available in GraphFrames
#UnifiedDataAnalytics #SparkAISummit 25
Free O’Reilly Book
neo4j.com/
graph-algorithms-book
• Spark & Neo4j Examples
• Machine Learning Chapter
#UnifiedDataAnalytics #SparkAISummit
Cypher
An open language for graph querying
#UnifiedDataAnalytics #SparkAISummit
Cypher query language
Cypher 9 is the latest full version of openCypher
○ Implemented in Neo4j 3.5
○ Implemented in whole/part by six other vendors
○ Several other partial and research implementations
○ Cypher for Gremlin is another openCypher project
#UnifiedDataAnalytics #SparkAISummit
Cypher 9 in Morpheus and Spark Graph (SPIP)
Cypher is a full CRUD language
○ RETURNs only tabular results: not composable
○ Results can include graph elements (paths,
relationships, nodes) or property values
Morpheus and SPIP implement most of read-only Cypher
○ No MERGE or DELETE
○ Spark immutable data + transformations
#UnifiedDataAnalytics #SparkAISummit
Cypher 10 in Morpheus - Multiple graphs
Cypher 10 proposes support for Multiple Graphs
○ Multiple Graph CIP: https://git.io/fjmrx
Allows for Cypher Query composition
○ Similar to chaining transformations on DataFrames
Support Graph Catalog for managing Graphs
○ Analogous to Spark SQL catalog
Query support for Graph Construction
#UnifiedDataAnalytics #SparkAISummit
Returning tabular data Input: a property graph
Output: a table
FROM GRAPH socialNetwork
MATCH ({name: 'Dan'})-[:FRIEND*2]->(foaf)
RETURN toUpper(foaf.name) AS name
ORDER BY name DESC
Language features available in Morpheus
#UnifiedDataAnalytics #SparkAISummit
Constructing graphs Input: a property graph
Output: a property graph
FROM GRAPH socialNetwork
MATCH (p:Person)-[:FRIEND*2]->(foaf)
WHERE NOT (p)-[:FRIEND]->(foaf)
CONSTRUCT
CREATE (p)-[:POSSIBLE_FRIEND]->(foaf)
RETURN GRAPH
Language features available in Morpheus
#UnifiedDataAnalytics #SparkAISummit
Querying multiple graphs Input: property graphs
Output: a property graph
FROM GRAPH socialNetwork
MATCH (p:Person)
FROM GRAPH products
MATCH (c:Customer)
WHERE p.email = c.email
CONSTRUCT ON socialNetwork, products
CREATE (p)-[:IS]->(c)
RETURN GRAPH
Language features available in Morpheus
#UnifiedDataAnalytics #SparkAISummit
Creating graph views Input: property graphs
Output: a property graph
CATALOG CREATE VIEW youngFriends($inGraph){
FROM GRAPH $inGraph
MATCH (p1:Person)-[r]->(p2:Person)
WHERE p1.age < 25 AND p2.age < 25
CONSTRUCT
CREATE (p1)-[COPY OF r]->(p2)
RETURN GRAPH
}
Language features available in Morpheus
#UnifiedDataAnalytics #SparkAISummit
Using graph views Input: property graphs
Output: table or graph
FROM youngFriends(socialNetwork)
MATCH (p:Person)-[r]->(o)
RETURN p, r, o
// and views over views
FROM youngFriends(europe(socialNetwork))
MATCH ...
Language features available in Morpheus
#UnifiedDataAnalytics #SparkAISummit
Managing multiple
graphs
#UnifiedDataAnalytics #SparkAISummit
Managing multiple graphs
Property Graphs are managed within a catalog
Cypher Session
Property Graph Catalog
Property Graph Data Source <namespace>
Property Graph <name>
QualifiedGraphName = <namespace>.<name>
#UnifiedDataAnalytics #SparkAISummit
PGDS implementations in Morpheus
PGDS Multiple graphs Read graphs Write graphs
File-based
Parquet, ORC, CSV
HDFS, local, S3
Yes Yes Yes
SQL
Hive, Jdbc
Yes Yes No
Neo4j Bolt Yes Yes Yes
Neo4j Bulk Import No No Yes
#UnifiedDataAnalytics #SparkAISummit
Catalog operations via Cypher
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US” (Property Graph)
#UnifiedDataAnalytics #SparkAISummit
Read from single Property Graph
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US” (Property Graph)
FROM social-net.US
MATCH (p:Person)
RETURN p
#UnifiedDataAnalytics #SparkAISummit
Read from multiple Property Graphs
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
“products” (SQL PGDS)
“2018”
“2017”
FROM social-net.US
MATCH (p:Person)
FROM products.2018
MATCH (c:Customer)
WHERE p.email = c.email
RETURN p, c
#UnifiedDataAnalytics #SparkAISummit
Construct new Property Graphs
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
“products” (SQL PGDS)
“2018”
“2017”
CATALOG CREATE GRAPH social-net.US_new {
FROM social-net.US
MATCH (p:Person)
FROM products.2018
MATCH (c:Customer)
WHERE p.email = c.email
CONSTRUCT ON social-net.US
CREATE (p)-[:SAME_AS]->(c)
RETURN GRAPH
}
#UnifiedDataAnalytics #SparkAISummit
Construct new Property Graphs
CATALOG CREATE GRAPH social-net.US_new {
FROM social-net.US
MATCH (p:Person)
FROM products.2018
MATCH (c:Customer)
WHERE p.email = c.email
CONSTRUCT ON social-net.US
CREATE (p)-[:SAME_AS]->(c)
RETURN GRAPH
}
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
“products” (SQL PGDS)
“2018”
“2017”
“US_new”
#UnifiedDataAnalytics #SparkAISummit
Create and query Graph Views
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
...
CATALOG CREATE VIEW youngPeople($sn) {
FROM $sn
MATCH (p:Person)-[r]->(n)
WHERE p.age < 21
CONSTRUCT
CREATE (p)-[COPY OF r]->(n)
RETURN GRAPH
}
FROM youngPeople(social-net.US)
MATCH (p:Person)
RETURN p
“youngPeople”
Views
#UnifiedDataAnalytics #SparkAISummit
Morpheus integrates with
Neo4j Graph Algorithms
#UnifiedDataAnalytics #SparkAISummit
The Yelp Open Dataset
45
:Business
name : ACME
address : 123 ACME Rd.
city : San Jose
state : CA
:User
name : Alice
since : 2013
elite : [2014, 2016]
:User
name : Bob
since : 2014
elite : null
:REVIEWS
stars : 5
date : 2014-02-03
:REVIEWS
stars : 4
date : 2014-08-03
https://www.yelp.com
https://www.yelp.com/dataset
https://www.yelp.com/dataset/challenge
#UnifiedDataAnalytics #SparkAISummit
Yelp Demo Overview
46
Part 1
From JSON to Graph
Create persistent
Property Graph from
raw Yelp dataset
Read Yelp Data from
JSON into DataFrames
Create Property Graph
from DataFrames
Store Property Graph
using Parquet
Part 2
A library of Graphs
Create a library of
graph projections
Read Property Graph
from Parquet
Create subgraph for a
specifc city
Project and persist city
subgraph
Part 3
Federated queries
Integrate reviews with
social network data
Define Graph Type and
Mapping with Graph
DDL
Load data from Hive
and H2
Run analytical query on
the integrated graph
Part 5
Neo4j Integration II
Recommend
businesses to users
Load graph projections
from library
Write graphs to Neo4j,
run Louvain + Jaccard
Run analytical query in
Morpheus to find
recommendations
Part 4
Neo4j Integration I
Find trending
businesses
Load graph projections
from library
Write graphs to Neo4j
and run PageRank
Combine graphs in
Morpheus and select
trending businesses
https://git.io/fjZ2b
#UnifiedDataAnalytics #SparkAISummit
Starting point: A Library of Graphs
47
2015 - 2018
(:User)-[:CO_REVIEWS]->(:User)
(:User)-[:REVIEWS]->(:Business)
(:User)-[:CO_REVIEWS]->(:User)
Constuct graphs for each year
(:Business)-[:CO_REVIEWED]->(:Business)
https://git.io/fjZ25
#UnifiedDataAnalytics #SparkAISummit
Computing “TrendRank” for Yelp businesses
48
2017
to
2018
call algo.pagerank
2017
2018
trendRank =
pageRank_2018 -
pageRank_2017
⋈
(:Business)
-[:CO_REVIEWED]->
(:Business)
https://git.io/fjZ2j
#UnifiedDataAnalytics #SparkAISummit
Neo4j Integration Demo
49
#UnifiedDataAnalytics #SparkAISummit
Morpheus is binary
compatible with
Spark Graph
#UnifiedDataAnalytics #SparkAISummit
Morpheus and Spark Graph: API compatibility
spark-graph-api
spark-cypher
spark-sql
okapi* morpheus
spark-sql
openCypherSPIP
Cypher to relational
operators compiler
openCypher
* Graph Features in Spark 3.0: Thursday 11AM, Room G104
#UnifiedDataAnalytics #SparkAISummit
Binary Switch Demo
52
#UnifiedDataAnalytics #SparkAISummit
Outlook next talk
Graph Features in Spark 3.0:
Thursday 11AM, Room G104
https://theoatmeal.com/comics/sneak_peek
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Databricks
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationDatabricks
 
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkVectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkDatabricks
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Databricks
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Databricks
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastDatabricks
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Databricks
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsAutomating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsDatabricks
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data ValidationDatabricks
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks DataWorks Summit/Hadoop Summit
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesDatabricks
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveSpark Summit
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementDatabricks
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIDatabricks
 
Scalable Machine Learning Pipeline For Meta Data Discovery From eBay Listings
Scalable Machine Learning Pipeline For Meta Data Discovery From eBay ListingsScalable Machine Learning Pipeline For Meta Data Discovery From eBay Listings
Scalable Machine Learning Pipeline For Meta Data Discovery From eBay ListingsSpark Summit
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 

What's hot (20)

Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
 
Powering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script TransformationPowering Custom Apps at Facebook using Spark Script Transformation
Powering Custom Apps at Facebook using Spark Script Transformation
 
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkVectorized R Execution in Apache Spark
Vectorized R Execution in Apache Spark
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!Make your PySpark Data Fly with Arrow!
Make your PySpark Data Fly with Arrow!
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
Performance Optimization of Recommendation Training Pipeline at Netflix DB Ts...
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFsAutomating Predictive Modeling at Zynga with PySpark and Pandas UDFs
Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
 
Scalable Machine Learning Pipeline For Meta Data Discovery From eBay Listings
Scalable Machine Learning Pipeline For Meta Data Discovery From eBay ListingsScalable Machine Learning Pipeline For Meta Data Discovery From eBay Listings
Scalable Machine Learning Pipeline For Meta Data Discovery From eBay Listings
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 

Similar to Extending Spark Graph for the Enterprise with Morpheus and Neo4j

Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMartin Junghanns
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkHenning Kropp
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and RDatabricks
 
Big data analysis using spark r published
Big data analysis using spark r publishedBig data analysis using spark r published
Big data analysis using spark r publishedDipendra Kusi
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Holden Karau
 
Briefing on the Modern ML Stack with R
 Briefing on the Modern ML Stack with R Briefing on the Modern ML Stack with R
Briefing on the Modern ML Stack with RDatabricks
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patengeKarin Patenge
 
Internals of Speeding up PySpark with Arrow
 Internals of Speeding up PySpark with Arrow Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with ArrowDatabricks
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Spark Summit
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsDatabricks
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...DataWorks Summit/Hadoop Summit
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comKarin Patenge
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patengeKarin Patenge
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversityAlex Zeltov
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesData Ninja API
 

Similar to Extending Spark Graph for the Enterprise with Morpheus and Neo4j (20)

Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 
Big data analysis using spark r published
Big data analysis using spark r publishedBig data analysis using spark r published
Big data analysis using spark r published
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
Briefing on the Modern ML Stack with R
 Briefing on the Modern ML Stack with R Briefing on the Modern ML Stack with R
Briefing on the Modern ML Stack with R
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
 
Internals of Speeding up PySpark with Arrow
 Internals of Speeding up PySpark with Arrow Internals of Speeding up PySpark with Arrow
Internals of Speeding up PySpark with Arrow
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Graph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.comGraph Analytics on Data from Meetup.com
Graph Analytics on Data from Meetup.com
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

Extending Spark Graph for the Enterprise with Morpheus and Neo4j

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Martin Junghanns & Sören Reichardt Neo4j Extending Spark Graph for the Enterprise with Morpheus and Neo4j #UnifiedDataAnalytics #SparkAISummit
  • 6. #UnifiedDataAnalytics #SparkAISummit The Property Graph Model Node ● Represents an entity within the graph ● Can have labels Relationship ● Connects a start node with an end node ● Has one type Property ● Describes a node/relationship: e.g. name, age, weight etc ● Key-value pair: String key; typed value (string, number, list, ...) 6
  • 9. #UnifiedDataAnalytics #SparkAISummit Spark Project Improvement Proposal ● Defines a Cypher-compatible Property Graph type based on DataFrames ● Replaces GraphFrames querying with Cypher ● Reimplements GraphFrames/GraphX algos on the Property Graph type ● Running PoC: [SPARK-27299][GRAPH][WIP] Spark Graph API design proposal https://git.io/fjqp6
  • 10. #UnifiedDataAnalytics #SparkAISummit SPIP: What are we trying to do? ● “Spark Cypher” ○ Run a Cypher query on a Property Graph returning a tabular result ● Implementation is based on Spark SQL ○ Property Graphs are composed of one or more DFs ● Provide Scala, Python and Java APIs ● Deep dive: Graph Features in Spark 3.0: Thursday 11AM, Room G104
  • 11. #UnifiedDataAnalytics #SparkAISummit SPIP: How does it look like? 11 spark-graph-api spark-cypher spark-sql SPIP
  • 13. #UnifiedDataAnalytics #SparkAISummit SPIP: What are we not solving? ● Addresses the Cypher Property Graph Model ○ Does not deal with variants of that model (e.g. RDF) ● No multiple graph features ○ API is flexible to support this in future iterations ● No Property Graph Catalog ○ Also no Property Graph specific Data Sources
  • 16. #UnifiedDataAnalytics #SparkAISummit The OLTP / OLAP landscape Tables Graphs Transactional PostgreSQL, Oracle, SQLServer Neo4j Data Integration & Analytics Spark SQL Morpheus
  • 17. #UnifiedDataAnalytics #SparkAISummit Morpheus creates Property Graphs ... PROPERTY GRAPH composing DataFrames Hive, DF, JDBC TABLES SUB- GRAPH FS snapshot Morpheus SOURCES
  • 18. #UnifiedDataAnalytics #SparkAISummit … wrangles Property Graphs ... DataFrame Table Result Cypher QUERY Property Graph Result Property Graph Cypher QUERY Cypher QUERY Property Graph Result DataFrame Driving Table SPIP
  • 19. #UnifiedDataAnalytics #SparkAISummit … analyses graphs in Spark and Neo4j ... GRAPH ALGOS ANALYSIS toolsets DataFrame DataFrame Property Graph Property Graph
  • 20. #UnifiedDataAnalytics #SparkAISummit … and stores Property Graphs Morpheus STORE SUBGRAPH FS snapshot Property Graph
  • 21. #UnifiedDataAnalytics #SparkAISummit Spark and Neo4j Spark is an immutable data processing engine ○ Spark SQL organizes data in tables (DataFrames) ○ DataFrames can be queried via SQL ○ Spark SQL programs are optimized by Catalyst Neo4j is a native transactional CRUD database ○ Neo4j graphs use a native graph data representation ○ Neo4j graphs can be queried using Cypher ○ Neo4j has optimized in-process MT graph algos
  • 22. #UnifiedDataAnalytics #SparkAISummit Morpheus: SQL + Cypher in one session Graphs and tables are both useful data models ○ Finding paths and subgraphs, and transforming graphs ○ Viewing, aggregating and ordering values The Morpheus project parallels Spark SQL ○ PropertyGraph type (composed of DataFrames) ○ Catalog of graph data sources, named graphs, views, ○ Cypher query language A CypherSession adds graphs to a SparkSession
  • 23. #UnifiedDataAnalytics #SparkAISummit What is Morpheus used for? Data integration ○ Integrate (non-)graphy data from multiple, heterogeneous data sources into one or more property graphs Distributed Cypher execution ○ OLAP-style graph analytics Data science ○ Integration with other Spark libraries ○ Feature extraction using Neo4j Graph Algorithms
  • 24. #UnifiedDataAnalytics #SparkAISummit Neo4j Graph Algorithms https://bit.ly/2oUfnA5 • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors* Available in GraphFrames
  • 25. #UnifiedDataAnalytics #SparkAISummit 25 Free O’Reilly Book neo4j.com/ graph-algorithms-book • Spark & Neo4j Examples • Machine Learning Chapter
  • 27. #UnifiedDataAnalytics #SparkAISummit Cypher query language Cypher 9 is the latest full version of openCypher ○ Implemented in Neo4j 3.5 ○ Implemented in whole/part by six other vendors ○ Several other partial and research implementations ○ Cypher for Gremlin is another openCypher project
  • 28. #UnifiedDataAnalytics #SparkAISummit Cypher 9 in Morpheus and Spark Graph (SPIP) Cypher is a full CRUD language ○ RETURNs only tabular results: not composable ○ Results can include graph elements (paths, relationships, nodes) or property values Morpheus and SPIP implement most of read-only Cypher ○ No MERGE or DELETE ○ Spark immutable data + transformations
  • 29. #UnifiedDataAnalytics #SparkAISummit Cypher 10 in Morpheus - Multiple graphs Cypher 10 proposes support for Multiple Graphs ○ Multiple Graph CIP: https://git.io/fjmrx Allows for Cypher Query composition ○ Similar to chaining transformations on DataFrames Support Graph Catalog for managing Graphs ○ Analogous to Spark SQL catalog Query support for Graph Construction
  • 30. #UnifiedDataAnalytics #SparkAISummit Returning tabular data Input: a property graph Output: a table FROM GRAPH socialNetwork MATCH ({name: 'Dan'})-[:FRIEND*2]->(foaf) RETURN toUpper(foaf.name) AS name ORDER BY name DESC Language features available in Morpheus
  • 31. #UnifiedDataAnalytics #SparkAISummit Constructing graphs Input: a property graph Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person)-[:FRIEND*2]->(foaf) WHERE NOT (p)-[:FRIEND]->(foaf) CONSTRUCT CREATE (p)-[:POSSIBLE_FRIEND]->(foaf) RETURN GRAPH Language features available in Morpheus
  • 32. #UnifiedDataAnalytics #SparkAISummit Querying multiple graphs Input: property graphs Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person) FROM GRAPH products MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON socialNetwork, products CREATE (p)-[:IS]->(c) RETURN GRAPH Language features available in Morpheus
  • 33. #UnifiedDataAnalytics #SparkAISummit Creating graph views Input: property graphs Output: a property graph CATALOG CREATE VIEW youngFriends($inGraph){ FROM GRAPH $inGraph MATCH (p1:Person)-[r]->(p2:Person) WHERE p1.age < 25 AND p2.age < 25 CONSTRUCT CREATE (p1)-[COPY OF r]->(p2) RETURN GRAPH } Language features available in Morpheus
  • 34. #UnifiedDataAnalytics #SparkAISummit Using graph views Input: property graphs Output: table or graph FROM youngFriends(socialNetwork) MATCH (p:Person)-[r]->(o) RETURN p, r, o // and views over views FROM youngFriends(europe(socialNetwork)) MATCH ... Language features available in Morpheus
  • 36. #UnifiedDataAnalytics #SparkAISummit Managing multiple graphs Property Graphs are managed within a catalog Cypher Session Property Graph Catalog Property Graph Data Source <namespace> Property Graph <name> QualifiedGraphName = <namespace>.<name>
  • 37. #UnifiedDataAnalytics #SparkAISummit PGDS implementations in Morpheus PGDS Multiple graphs Read graphs Write graphs File-based Parquet, ORC, CSV HDFS, local, S3 Yes Yes Yes SQL Hive, Jdbc Yes Yes No Neo4j Bolt Yes Yes Yes Neo4j Bulk Import No No Yes
  • 38. #UnifiedDataAnalytics #SparkAISummit Catalog operations via Cypher Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” (Property Graph)
  • 39. #UnifiedDataAnalytics #SparkAISummit Read from single Property Graph Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” (Property Graph) FROM social-net.US MATCH (p:Person) RETURN p
  • 40. #UnifiedDataAnalytics #SparkAISummit Read from multiple Property Graphs Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email RETURN p, c
  • 41. #UnifiedDataAnalytics #SparkAISummit Construct new Property Graphs Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” CATALOG CREATE GRAPH social-net.US_new { FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON social-net.US CREATE (p)-[:SAME_AS]->(c) RETURN GRAPH }
  • 42. #UnifiedDataAnalytics #SparkAISummit Construct new Property Graphs CATALOG CREATE GRAPH social-net.US_new { FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON social-net.US CREATE (p)-[:SAME_AS]->(c) RETURN GRAPH } Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” “US_new”
  • 43. #UnifiedDataAnalytics #SparkAISummit Create and query Graph Views Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” ... CATALOG CREATE VIEW youngPeople($sn) { FROM $sn MATCH (p:Person)-[r]->(n) WHERE p.age < 21 CONSTRUCT CREATE (p)-[COPY OF r]->(n) RETURN GRAPH } FROM youngPeople(social-net.US) MATCH (p:Person) RETURN p “youngPeople” Views
  • 45. #UnifiedDataAnalytics #SparkAISummit The Yelp Open Dataset 45 :Business name : ACME address : 123 ACME Rd. city : San Jose state : CA :User name : Alice since : 2013 elite : [2014, 2016] :User name : Bob since : 2014 elite : null :REVIEWS stars : 5 date : 2014-02-03 :REVIEWS stars : 4 date : 2014-08-03 https://www.yelp.com https://www.yelp.com/dataset https://www.yelp.com/dataset/challenge
  • 46. #UnifiedDataAnalytics #SparkAISummit Yelp Demo Overview 46 Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  • 47. #UnifiedDataAnalytics #SparkAISummit Starting point: A Library of Graphs 47 2015 - 2018 (:User)-[:CO_REVIEWS]->(:User) (:User)-[:REVIEWS]->(:Business) (:User)-[:CO_REVIEWS]->(:User) Constuct graphs for each year (:Business)-[:CO_REVIEWED]->(:Business) https://git.io/fjZ25
  • 48. #UnifiedDataAnalytics #SparkAISummit Computing “TrendRank” for Yelp businesses 48 2017 to 2018 call algo.pagerank 2017 2018 trendRank = pageRank_2018 - pageRank_2017 ⋈ (:Business) -[:CO_REVIEWED]-> (:Business) https://git.io/fjZ2j
  • 50. #UnifiedDataAnalytics #SparkAISummit Morpheus is binary compatible with Spark Graph
  • 51. #UnifiedDataAnalytics #SparkAISummit Morpheus and Spark Graph: API compatibility spark-graph-api spark-cypher spark-sql okapi* morpheus spark-sql openCypherSPIP Cypher to relational operators compiler openCypher * Graph Features in Spark 3.0: Thursday 11AM, Room G104
  • 53. #UnifiedDataAnalytics #SparkAISummit Outlook next talk Graph Features in Spark 3.0: Thursday 11AM, Room G104 https://theoatmeal.com/comics/sneak_peek
  • 54. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT