SlideShare a Scribd company logo
GraphFrames Access Methods
Jim Hatcher
Solution Architect, DataStax
Twitter: @thejimhatcher
Graph Day - San Francisco
September 2018
© DataStax, All Rights Reserved.1
Agenda
© 2016 DataStax, All Rights Reserved. 2
● Building Blocks
● OSS Spark GraphFrames
● DSEGraphFrames
● Demo
● Resources
Building Blocks
3
Concepts DataStax Enterprise (DSE)Open Source
Graph
Theory
Database
Graph
Database
Distributed
Database
Execution
Framework
Distributed
Execution
Framework
Apache
Spark
Apache
Cassandra
DSE
Graph
DSE Graph Frames - Mental Model of Concepts
Spark
GraphX
Spark
Graph
Frames
DSE
Graph
Frames
DSE
Search
DSE
Analytics
DSE
Core
Machine
Learning
Graph
Algorithms
Spark
Data
Frames
OLTP /
Realtime
Database
Resilient
Distributed
Dataset
(RDD) Spark
Query Plan
& Memory
Optimi-
zation
Apache
Tinkerpop
& Gremlin
Cluster
Data Center 1
OLTP / Realtime
Data Center 2
OLAP / Batch
Real-time Clients Batch Clients
Typical Cluster Topology in DSE Graph
OSS Spark GraphFrames
6
Capabilities
© 2016 DataStax, All Rights Reserved. 7
● Parallelization / Resilience / Distributed (from Spark)
● Query Plan Optimization (from Spark’s Catalyst engine)
● Memory Optimization (from Spark’s Tungsten engine)
● Spark SQL (from Spark DataFrames)
Motif Finding
© 2016 DataStax, All Rights Reserved. 8
● Motif Finding
○ g.find()
○ motif (subset of cypher)
Graph Algorithms
© 2016 DataStax, All Rights Reserved. 9
● Graph Algorithms (from GraphX)
○ Breadth-First Search (BFS)
○ Connected Components / Strongly Connected Components
○ Label Propagation Algorithm (LPA)
○ Page Rank
○ Shortest Paths
○ SVD++
○ Triangle Count
● Building blocks to write your own algorithms
○ aggregateMessages()
○ pregel() - GraphX
Data Source
© 2016 DataStax, All Rights Reserved. 10
● Load your vertices / edges from any Spark source
DSEGraphFrames
11
Data Source
© 2016 DataStax, All Rights Reserved. 12
● Point to your DSE Graph
val g = spark.dseGraph(“my_graph_name”)
● Or, point to any other data source
Apache Tinkerpop support
© 2016 DataStax, All Rights Reserved. 13
● The same Gremlin that you write for your OLTP-based traversals can be used for Analytical
requirements
● However, only a limited subset of the Gremlin steps are implemented currently
○ Inclusions:
■ DSE 5.1: https://docs.datastax.com/en/dse/5.1/dse-
dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html
■ DSE 6.0: https://docs.datastax.com/en/dse/6.0/dse-
dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html
○ Notable Exclusions:
■ repeat()
■ union()
■ as() / select() -- added in DSE 6.0
Good for Scan Operations
© 2016 DataStax, All Rights Reserved. 14
● Very good for operations that require table scans
○ Examples:
■ g.V().count()
■ g.E().count()
■ g.V().groupCount().by(__.label())
■ g.E().groupCount().by(__.label())
Mutations
© 2016 DataStax, All Rights Reserved. 15
● Effective way of mutating the graph (not available in OSS GraphFrames)
○ Mutations cannot be done using Gremlin OLAP
○ Takes advantage of Spark’s innate ability to parallelize processes
● Potential Use Cases
○ Migration from current graph schema to new graph schema
○ Adding shortcut edges
○ Initial load of the graph
■ Requires a distributed file system such as DSEFS or HDFS
○ Drop all instances of Vertex Label X
© 2016 DataStax, All Rights Reserved. 16
Demo
Dataset
© 2016 DataStax, All Rights Reserved. 17
KillrVideo - reference application
https://github.com/datastax/graph-examples/
Summary Traversals - TinkerPop/Gremlin
© 2016 DataStax, All Rights Reserved. 18
val g = spark.dseGraph("killrvideo")
g.V().count()
g.E().count()
g.V().groupCount().by(__.label())
g.E().groupCount().by(__.label())
//get count of actors by movie
g.V()
.hasLabel("movie")
//.has("title", "I Am Legend")
.as("m")
.out("actor")
.groupCount().by(__.select("m").values("title"))
.order(local).by(values, decr)
Summary Traversals - Spark SQL
© 2016 DataStax, All Rights Reserved. 19
//register our vertex and edge tables so we can reference them in Spark SQL
spark.read.format("com.datastax.bdp.graph.spark.sql.vertex").option("graph",
"killrvideo").load.createOrReplaceTempView("vertices")
spark.read.format("com.datastax.bdp.graph.spark.sql.edge").option("graph",
"killrvideo").load.createOrReplaceTempView("edges")
//get Count of Actors by movie
val moviesAndActorCounts = spark.sql("""
SELECT vMovie.title, COUNT(*) AS NumberOfActors
FROM vertices vMovie
INNER JOIN edges eActor ON vMovie.id = eActor.src AND eActor.`~label` = 'actor'
WHERE vMovie.`~label` = 'movie'
GROUP BY vMovie.id, vMovie.title
ORDER BY COUNT(*) DESC
""")
moviesAndActorCounts.show(false)
//moviesAndActorCounts.explain
Summary Traversals - Spark SQL (cont'd)
© 2016 DataStax, All Rights Reserved. 20
val actorsInMultipleGenres = spark.sql("""
SELECT ActorGenreGrouping.ActorName, ActorGenreGrouping.NumberOfGenres
FROM
(
SELECT vPerson.name AS ActorName, COUNT(*) AS NumberOfGenres
FROM vertices vPerson
INNER JOIN edges eActor ON vPerson.id = eActor.dst AND eActor.`~label` = 'actor'
INNER JOIN vertices vMovie ON vMovie.id = eActor.src AND vPerson.`~label` = 'person'
INNER JOIN edges eGenre ON vMovie.id = eGenre.src AND eGenre.`~label` = 'belongsTo'
INNER JOIN vertices vGenre ON vGenre.id = eGenre.dst AND vGenre.`~label` = 'genre'
WHERE vPerson.`~label` = 'person'
AND vPerson.name <> 'Animation'
GROUP BY vPerson.name, vGenre.name
) AS ActorGenreGrouping
WHERE ActorGenreGrouping.NumberOfGenres > 1
ORDER BY ActorGenreGrouping.NumberOfGenres DESC
""")
actorsInMultipleGenres.show(false)
Motif finding
© 2016 DataStax, All Rights Reserved. 21
val g = spark.dseGraph("killrvideo")
//get a list of actors who have worked in comedy movies
var comedyActors = g.find("(movie)-[e1]->(person); (movie)-[e2]->(genre)")
.filter("""
person.`~label` = 'person'
and e1.`~label` = 'actor'
and movie.`~label` = 'movie'
and e2.`~label` = 'belongsTo'
and genre.`~label` = 'genre'
and genre.name = 'Comedy'
""")
.select("person.name", "movie.title", "genre.name")
comedyActors.show(false)
//comedyActors.explain
Adding Shortcut Edges - DataFrames
© 2016 DataStax, All Rights Reserved. 22
val g = spark.dseGraph("killrvideo")
val vPerson1 = g.vertices.filter($"~label" === "person")
val eActor1 = g.edges.filter($"~label" === "actor")
val vMovie1 = g.vertices.filter($"~label" === "movie")
val eActor2 = g.edges.filter($"~label" === "actor")
val tempResults1 = vPerson1
.join(eActor1, vPerson1.col("id") === eActor1.col("dst"))
.select(vPerson1.col("id").as("vPerson1_id"), vPerson1.col("name").as("vPerson1_name"), eActor1.col("src").as("eActor1_src"))
val tempResults2 = tempResults1
.join(vMovie1, tempResults1.col("eActor1_src") === vMovie1.col("id"))
.select(tempResults1.col("vPerson1_id"), tempResults1.col("vPerson1_name"), vMovie1.col("id").as("vMovie1_id"), vMovie1.col("title"))
val tempResults3 = tempResults2
.join(eActor2, tempResults2.col("vMovie1_id") === eActor2.col("src"))
.select(tempResults2.col("vPerson1_id"), tempResults2.col("vPerson1_name"), tempResults2.col("title"), eActor2.col("dst").as("eActor2_dst"))
val shortcutEdges = tempResults3
.filter($"vPerson1_id" =!= $"eActor2_dst")
.select(tempResults3.col("vPerson1_id").as("src"), tempResults3.col("eActor2_dst").as("dst"), lit("workedTogether").as("~label"))
g.updateEdges(shortcutEdges)
Shortest Path
© 2016 DataStax, All Rights Reserved. 23
spark.sparkContext.setCheckpointDir("dsefs://127.0.0.1:5598/checkpoints")
val g = spark.dseGraph("killrvideo")
val johnWayneId = g.V.has("person", "name", "John Wayne").df.collect()(0)(0)
val jamesStewartId = g.V.has("person", "name", "James Stewart").df.collect()(0)(0)
val shortestPaths = g.shortestPaths.landmarks(Seq(johnWayneId, jamesStewartId)).run
//make a C* table that matches the schema of my dataframe
shortestPaths.createCassandraTable(
"test", //keyspace
"shortest_paths", //table_name
partitionKeyColumns = Some(Seq("id")),
clusteringKeyColumns = Some(Seq("~label")))
Shortest Path (cont'd)
© 2016 DataStax, All Rights Reserved. 24
//write to the table
shortestPaths.write.format("org.apache.spark.sql.cassandra")
.options(
Map(
"table" -> "shortest_paths",
"keyspace" -> "test",
"spark.cassandra.output.ignoreNulls" -> "true"
)
).save
//read it back in later
//val shortestPaths.read.cassandraFormat("shortest_paths", "test").load
shortestPaths
.filter($"~label" === "person")
.select('name, 'distances(johnWayneId).as("hopsFromDuke"), 'distances(jamesStewartId).as("hopsFromJimmy"))
.orderBy('hopsFromJohnWayne desc)
.show(500, false)
Resources
© 2016 DataStax, All Rights Reserved. 25
https://graphframes.github.io/user-guide.html
https://github.com/apache/spark/tree/master/graphx/src/main/scala/org/apache/spark/graphx
https://github.com/graphframes/graphframes
https://www.youtube.com/watch?v=DW09q18OHfc - Russell Spitzer / Artem Aliev - Spark Summit talk
https://www.datastax.com/dev/blog/dse-graph-frame
https://github.com/datastax/graph-examples/blob/master/dse-graph-frame/Spark-shell-notes.scala
https://www.manning.com/books/spark-graphx-in-action
https://academy.datastax.com/resources/ds332

More Related Content

What's hot

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
Julian Hyde
 
AfterGlow
AfterGlowAfterGlow
AfterGlow
Raffael Marty
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB
graphdevroom
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
Amund Tveit
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smetnkaluva
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
Julian Hyde
 
A Divine Data Comedy
A Divine Data ComedyA Divine Data Comedy
A Divine Data Comedy
Mike Harris
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
Marko Rodriguez
 
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
Hortonworks
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Cody Ray
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
Revolution Analytics
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Zalando Technology
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
johnynek
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
Kohei KaiGai
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Holden Karau
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Robert Stupp
 

What's hot (20)

Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
AfterGlow
AfterGlowAfterGlow
AfterGlow
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
Ft10 de smet
Ft10 de smetFt10 de smet
Ft10 de smet
 
Spatial query on vanilla databases
Spatial query on vanilla databasesSpatial query on vanilla databases
Spatial query on vanilla databases
 
A Divine Data Comedy
A Divine Data ComedyA Divine Data Comedy
A Divine Data Comedy
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
Trivadis TechEvent 2016 Polybase challenges Hive relational access to non-rel...
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
 
RHadoop, R meets Hadoop
RHadoop, R meets HadoopRHadoop, R meets Hadoop
RHadoop, R meets Hadoop
 
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj TalkSpark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
Spark + Clojure for Topic Discovery - Zalando Tech Clojure/Conj Talk
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
 
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)Cassandra + Spark (You’ve got the lighter, let’s start a fire)
Cassandra + Spark (You’ve got the lighter, let’s start a fire)
 

Similar to GraphFrames Access Methods in DSE Graph

Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
Máté Szalay-Bekő
 
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume LaforgeGroovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Guillaume Laforge
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
Databricks
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
Jean Ihm
 
Bridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the GuardianBridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the Guardian
Kaelig Deloumeau-Prigent
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
Duyhai Doan
 
GraphQL IndyJS April 2016
GraphQL IndyJS April 2016GraphQL IndyJS April 2016
GraphQL IndyJS April 2016
Brad Pillow
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Mammoth Data
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
jeykottalam
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
dhiguero
 
Dax Declarative Api For Xml
Dax   Declarative Api For XmlDax   Declarative Api For Xml
Dax Declarative Api For Xml
Lars Trieloff
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Duyhai Doan
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with Prisma
Nikolas Burk
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
Daniel Nüst
 
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Kevin Lee
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Data Con LA
 

Similar to GraphFrames Access Methods in DSE Graph (20)

Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
Trivadis TechEvent 2016 Introduction to DataStax Enterprise (DSE) Graph by Gu...
 
Evolution of Spark APIs
Evolution of Spark APIsEvolution of Spark APIs
Evolution of Spark APIs
 
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume LaforgeGroovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
Groovy in the Enterprise - Case Studies - TSSJS Prague 2008 - Guillaume Laforge
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Bridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the GuardianBridging the gap between designers and developers at the Guardian
Bridging the gap between designers and developers at the Guardian
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
GraphQL IndyJS April 2016
GraphQL IndyJS April 2016GraphQL IndyJS April 2016
GraphQL IndyJS April 2016
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Dax Declarative Api For Xml
Dax   Declarative Api For XmlDax   Declarative Api For Xml
Dax Declarative Api For Xml
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
COLLADA & WebGL
COLLADA & WebGLCOLLADA & WebGL
COLLADA & WebGL
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Code-first GraphQL Server Development with Prisma
Code-first  GraphQL Server Development with PrismaCode-first  GraphQL Server Development with Prisma
Code-first GraphQL Server Development with Prisma
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)
 
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
Big Data Day LA 2015 - Compiling DSLs for Diverse Execution Environments by Z...
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 

GraphFrames Access Methods in DSE Graph

  • 1. GraphFrames Access Methods Jim Hatcher Solution Architect, DataStax Twitter: @thejimhatcher Graph Day - San Francisco September 2018 © DataStax, All Rights Reserved.1
  • 2. Agenda © 2016 DataStax, All Rights Reserved. 2 ● Building Blocks ● OSS Spark GraphFrames ● DSEGraphFrames ● Demo ● Resources
  • 4. Concepts DataStax Enterprise (DSE)Open Source Graph Theory Database Graph Database Distributed Database Execution Framework Distributed Execution Framework Apache Spark Apache Cassandra DSE Graph DSE Graph Frames - Mental Model of Concepts Spark GraphX Spark Graph Frames DSE Graph Frames DSE Search DSE Analytics DSE Core Machine Learning Graph Algorithms Spark Data Frames OLTP / Realtime Database Resilient Distributed Dataset (RDD) Spark Query Plan & Memory Optimi- zation Apache Tinkerpop & Gremlin
  • 5. Cluster Data Center 1 OLTP / Realtime Data Center 2 OLAP / Batch Real-time Clients Batch Clients Typical Cluster Topology in DSE Graph
  • 7. Capabilities © 2016 DataStax, All Rights Reserved. 7 ● Parallelization / Resilience / Distributed (from Spark) ● Query Plan Optimization (from Spark’s Catalyst engine) ● Memory Optimization (from Spark’s Tungsten engine) ● Spark SQL (from Spark DataFrames)
  • 8. Motif Finding © 2016 DataStax, All Rights Reserved. 8 ● Motif Finding ○ g.find() ○ motif (subset of cypher)
  • 9. Graph Algorithms © 2016 DataStax, All Rights Reserved. 9 ● Graph Algorithms (from GraphX) ○ Breadth-First Search (BFS) ○ Connected Components / Strongly Connected Components ○ Label Propagation Algorithm (LPA) ○ Page Rank ○ Shortest Paths ○ SVD++ ○ Triangle Count ● Building blocks to write your own algorithms ○ aggregateMessages() ○ pregel() - GraphX
  • 10. Data Source © 2016 DataStax, All Rights Reserved. 10 ● Load your vertices / edges from any Spark source
  • 12. Data Source © 2016 DataStax, All Rights Reserved. 12 ● Point to your DSE Graph val g = spark.dseGraph(“my_graph_name”) ● Or, point to any other data source
  • 13. Apache Tinkerpop support © 2016 DataStax, All Rights Reserved. 13 ● The same Gremlin that you write for your OLTP-based traversals can be used for Analytical requirements ● However, only a limited subset of the Gremlin steps are implemented currently ○ Inclusions: ■ DSE 5.1: https://docs.datastax.com/en/dse/5.1/dse- dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html ■ DSE 6.0: https://docs.datastax.com/en/dse/6.0/dse- dev/datastax_enterprise/graph/graphAnalytics/tinkerpopDseGraphFrame.html ○ Notable Exclusions: ■ repeat() ■ union() ■ as() / select() -- added in DSE 6.0
  • 14. Good for Scan Operations © 2016 DataStax, All Rights Reserved. 14 ● Very good for operations that require table scans ○ Examples: ■ g.V().count() ■ g.E().count() ■ g.V().groupCount().by(__.label()) ■ g.E().groupCount().by(__.label())
  • 15. Mutations © 2016 DataStax, All Rights Reserved. 15 ● Effective way of mutating the graph (not available in OSS GraphFrames) ○ Mutations cannot be done using Gremlin OLAP ○ Takes advantage of Spark’s innate ability to parallelize processes ● Potential Use Cases ○ Migration from current graph schema to new graph schema ○ Adding shortcut edges ○ Initial load of the graph ■ Requires a distributed file system such as DSEFS or HDFS ○ Drop all instances of Vertex Label X
  • 16. © 2016 DataStax, All Rights Reserved. 16 Demo
  • 17. Dataset © 2016 DataStax, All Rights Reserved. 17 KillrVideo - reference application https://github.com/datastax/graph-examples/
  • 18. Summary Traversals - TinkerPop/Gremlin © 2016 DataStax, All Rights Reserved. 18 val g = spark.dseGraph("killrvideo") g.V().count() g.E().count() g.V().groupCount().by(__.label()) g.E().groupCount().by(__.label()) //get count of actors by movie g.V() .hasLabel("movie") //.has("title", "I Am Legend") .as("m") .out("actor") .groupCount().by(__.select("m").values("title")) .order(local).by(values, decr)
  • 19. Summary Traversals - Spark SQL © 2016 DataStax, All Rights Reserved. 19 //register our vertex and edge tables so we can reference them in Spark SQL spark.read.format("com.datastax.bdp.graph.spark.sql.vertex").option("graph", "killrvideo").load.createOrReplaceTempView("vertices") spark.read.format("com.datastax.bdp.graph.spark.sql.edge").option("graph", "killrvideo").load.createOrReplaceTempView("edges") //get Count of Actors by movie val moviesAndActorCounts = spark.sql(""" SELECT vMovie.title, COUNT(*) AS NumberOfActors FROM vertices vMovie INNER JOIN edges eActor ON vMovie.id = eActor.src AND eActor.`~label` = 'actor' WHERE vMovie.`~label` = 'movie' GROUP BY vMovie.id, vMovie.title ORDER BY COUNT(*) DESC """) moviesAndActorCounts.show(false) //moviesAndActorCounts.explain
  • 20. Summary Traversals - Spark SQL (cont'd) © 2016 DataStax, All Rights Reserved. 20 val actorsInMultipleGenres = spark.sql(""" SELECT ActorGenreGrouping.ActorName, ActorGenreGrouping.NumberOfGenres FROM ( SELECT vPerson.name AS ActorName, COUNT(*) AS NumberOfGenres FROM vertices vPerson INNER JOIN edges eActor ON vPerson.id = eActor.dst AND eActor.`~label` = 'actor' INNER JOIN vertices vMovie ON vMovie.id = eActor.src AND vPerson.`~label` = 'person' INNER JOIN edges eGenre ON vMovie.id = eGenre.src AND eGenre.`~label` = 'belongsTo' INNER JOIN vertices vGenre ON vGenre.id = eGenre.dst AND vGenre.`~label` = 'genre' WHERE vPerson.`~label` = 'person' AND vPerson.name <> 'Animation' GROUP BY vPerson.name, vGenre.name ) AS ActorGenreGrouping WHERE ActorGenreGrouping.NumberOfGenres > 1 ORDER BY ActorGenreGrouping.NumberOfGenres DESC """) actorsInMultipleGenres.show(false)
  • 21. Motif finding © 2016 DataStax, All Rights Reserved. 21 val g = spark.dseGraph("killrvideo") //get a list of actors who have worked in comedy movies var comedyActors = g.find("(movie)-[e1]->(person); (movie)-[e2]->(genre)") .filter(""" person.`~label` = 'person' and e1.`~label` = 'actor' and movie.`~label` = 'movie' and e2.`~label` = 'belongsTo' and genre.`~label` = 'genre' and genre.name = 'Comedy' """) .select("person.name", "movie.title", "genre.name") comedyActors.show(false) //comedyActors.explain
  • 22. Adding Shortcut Edges - DataFrames © 2016 DataStax, All Rights Reserved. 22 val g = spark.dseGraph("killrvideo") val vPerson1 = g.vertices.filter($"~label" === "person") val eActor1 = g.edges.filter($"~label" === "actor") val vMovie1 = g.vertices.filter($"~label" === "movie") val eActor2 = g.edges.filter($"~label" === "actor") val tempResults1 = vPerson1 .join(eActor1, vPerson1.col("id") === eActor1.col("dst")) .select(vPerson1.col("id").as("vPerson1_id"), vPerson1.col("name").as("vPerson1_name"), eActor1.col("src").as("eActor1_src")) val tempResults2 = tempResults1 .join(vMovie1, tempResults1.col("eActor1_src") === vMovie1.col("id")) .select(tempResults1.col("vPerson1_id"), tempResults1.col("vPerson1_name"), vMovie1.col("id").as("vMovie1_id"), vMovie1.col("title")) val tempResults3 = tempResults2 .join(eActor2, tempResults2.col("vMovie1_id") === eActor2.col("src")) .select(tempResults2.col("vPerson1_id"), tempResults2.col("vPerson1_name"), tempResults2.col("title"), eActor2.col("dst").as("eActor2_dst")) val shortcutEdges = tempResults3 .filter($"vPerson1_id" =!= $"eActor2_dst") .select(tempResults3.col("vPerson1_id").as("src"), tempResults3.col("eActor2_dst").as("dst"), lit("workedTogether").as("~label")) g.updateEdges(shortcutEdges)
  • 23. Shortest Path © 2016 DataStax, All Rights Reserved. 23 spark.sparkContext.setCheckpointDir("dsefs://127.0.0.1:5598/checkpoints") val g = spark.dseGraph("killrvideo") val johnWayneId = g.V.has("person", "name", "John Wayne").df.collect()(0)(0) val jamesStewartId = g.V.has("person", "name", "James Stewart").df.collect()(0)(0) val shortestPaths = g.shortestPaths.landmarks(Seq(johnWayneId, jamesStewartId)).run //make a C* table that matches the schema of my dataframe shortestPaths.createCassandraTable( "test", //keyspace "shortest_paths", //table_name partitionKeyColumns = Some(Seq("id")), clusteringKeyColumns = Some(Seq("~label")))
  • 24. Shortest Path (cont'd) © 2016 DataStax, All Rights Reserved. 24 //write to the table shortestPaths.write.format("org.apache.spark.sql.cassandra") .options( Map( "table" -> "shortest_paths", "keyspace" -> "test", "spark.cassandra.output.ignoreNulls" -> "true" ) ).save //read it back in later //val shortestPaths.read.cassandraFormat("shortest_paths", "test").load shortestPaths .filter($"~label" === "person") .select('name, 'distances(johnWayneId).as("hopsFromDuke"), 'distances(jamesStewartId).as("hopsFromJimmy")) .orderBy('hopsFromJohnWayne desc) .show(500, false)
  • 25. Resources © 2016 DataStax, All Rights Reserved. 25 https://graphframes.github.io/user-guide.html https://github.com/apache/spark/tree/master/graphx/src/main/scala/org/apache/spark/graphx https://github.com/graphframes/graphframes https://www.youtube.com/watch?v=DW09q18OHfc - Russell Spitzer / Artem Aliev - Spark Summit talk https://www.datastax.com/dev/blog/dse-graph-frame https://github.com/datastax/graph-examples/blob/master/dse-graph-frame/Spark-shell-notes.scala https://www.manning.com/books/spark-graphx-in-action https://academy.datastax.com/resources/ds332