More Related Content Similar to Best Practices for Getting to Production with DataStax Enterprise Graph (20) Best Practices for Getting to Production with DataStax Enterprise Graph2. A robust, scale-out graph database that focuses on storing, processing,
and acting on highly connected and complex data relationships in real-
time.
DataStax Enterprise (DSE) Graph
© DataStax, All Rights Reserved. Confidential2
3. • Customer 360
• Personalization
• Recommendations
• Fraud Detection
• Internet of Things
• Asset Management
• Data Integration
Common DSE Graph Use Cases
3 © DataStax, All Rights Reserved. Confidential
4. Integrating data silos and
exploring neighborhoods to
provide personalized user
experience in real-time.
What is Customer 360 (C360)?
© DataStax, All Rights Reserved. Confidential4
Location
Social
Orders
Account
Contact
Feedback
Devices
360°
Customer
Channels
5. 1. Know Your Data
Distributions
5 © DataStax, All Rights Reserved. Confidential
6. Ask Yourself...
© DataStax, All Rights Reserved. Confidential6
What relationships exist currently or
could possibly exist in the data?1
Location
Social
Orders
Account
Contact
Feedback
Devices
360°
Customer
Channels
7. Ask Yourself...
© DataStax, All Rights Reserved. Confidential7
Which of the identified relationships
are important?
What relationships exist currently or
could possibly exist in the data?1
2
Location
Social
Orders
Account
Contact
Feedback
Devices
Customer
Channels
8. Ask Yourself...
© DataStax, All Rights Reserved. Confidential8
What is the distribution of those
relationships?
Which of the identified relationships
are important?
What relationships exist currently or
could possibly exist in the data?1
2
3
Email to Customer Distribution
...
Number of Customers
CountofEmails
9. Ask Yourself...
© DataStax, All Rights Reserved. Confidential9
What is the distribution of those
relationships?
Which of the identified relationships
are important?
What relationships exist currently or
could possibly exist in the data?1
2
3
Email to Customer Distribution
...
Number of Edges (Degree)
CountofEmails
10. 2. Know Your Access
Patterns… As Much as
Possible
10 © DataStax, All Rights Reserved. Confidential
11. Data Modeling
© DataStax, All Rights Reserved. Confidential11
“
The paradigm shift is that
we write our data according to
how we are going to read it.
Nate McCall on the journey of Apache Cassandra during DataStax Accelerate
12. Relational vs. Cassandra Data Modeling
© DataStax, All Rights Reserved. Confidential12
Application
Models
Data
Data
Models
Application
Relational Cassandra
13. Relational vs. Cassandra vs Graph Data Modeling
© DataStax, All Rights Reserved. Confidential13
Models
Data Application
Application
Models
Data
Data
Models
Application
Relational Cassandra
Graph
14. Common C360 Questions
© DataStax, All Rights Reserved. Confidential14
• Who is this customer?
• What is their name, location,
gender, and age?
• What has this customer recently
purchased online or in stores?
• What feedback have they left about
those purchases?
• Who is this customer related to?
• How influential is this customer?
Location
Social
Orders
Account
Contact
Feedback
Devices
Customer
Channels
15. Common C360 Queries
© DataStax, All Rights Reserved. Confidential15
• Who is this customer?
• What is their name, location,
gender, and age?
• What has this customer recently
purchased online or in stores?
• What feedback have they left about
those purchases?
• Who is this customer related to?
• How influential is this customer?
Location
Social
Orders
Account
Contact
Feedback
Devices
Customer
Channels
16. Conceptual Data Model
© DataStax, All Rights Reserved. Confidential16
• Who is this customer?
• What is their name, location,
gender, and age?
• What has this customer recently
purchased online or in stores?
• What feedback have they left about
those purchases?
• Who is this customer related to?
• How influential is this customer?
17. Logical Data Model
© DataStax, All Rights Reserved. Confidential17
• An entity with a single property and an average branching factor of one is a good
indication that the entity should be a property rather than a vertex.
• An entity that has a high median branching factor should be considered for properties
as opposed to vertices.
19. Understand Branching Factor
© DataStax, All Rights Reserved. Confidential19
Traversal time is
roughly
proportional to the
number of edges
and vertices
visited.
20. Understand Branching Factor
© DataStax, All Rights Reserved. Confidential20
Traversal time is
roughly
proportional to the
number of edges
and vertices
visited.
21. Filter Vertices out Along the Way
© DataStax, All Rights Reserved. Confidential21
If you know which
vertices you are not
looking for, avoid
walking to them.
22. Pick the Best Starting Point
© DataStax, All Rights Reserved. Confidential22
Consider where
your traversal
starts - do you walk
along less edges
when you start at
the black vertex or
the red vertex?
23. Go Back to the Data Model
© DataStax, All Rights Reserved. Confidential23
Can you optimize
the path from black
to red by adding a
short-cut edge?
24. Go Back to the Data Model
© DataStax, All Rights Reserved. Confidential24
Can you optimize
the path from black
to red by adding a
short-cut edge?
25. 4. Design a Supernode
Strategy
25 © DataStax, All Rights Reserved. Confidential
26. What is a supernode?
© DataStax, All Rights Reserved. Confidential26
A vertex with a disproportionately high
level of connected edges.
Causes problems such as:
• performance issues
• stability issues
• issues with visualization
• partial or incorrect results
27. What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential27
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
28. What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential28
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
RIGHT-SKEWED
Is the
data
valid?
29. What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential29
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
30. What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential30
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
31. What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential31
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
32. What should you do if you find a supernode?
© DataStax, All Rights Reserved. Confidential32
Try to model this
vertex as a
property.
Consider a
supernode
optimization
strategy.
Validate and clean
data on ingestion.
RIGHT-SKEWED
YES
NO YES
LEFT-SKEWED
NO
What does
the
distribution
look like?
Is your data
the sole
source of
truth?
Is the
data
valid?
33. Supernode Strategy: Add an Edge Index
© DataStax, All Rights Reserved. Confidential33
Edge indices, also called Vertex Centric Indices are local to a vertex,
and give the ability to find and traverse only the edges we need
without scanning all edges.
To leverage the index, filter on the edge during the traversal.
34. Supernode Strategy: Get More Specific
© DataStax, All Rights Reserved. Confidential34
Make your vertices more granular by including another field in the ID
of the vertex.
vs.
35. If you have a known supernode, but the vertex is too complex to be a
property, you can avoid performance issues by only traversing in to
the vertex to gather information.
Supernode Strategy: Traverse In, but not Out
© DataStax, All Rights Reserved. Confidential35
36. 5. Embrace a Multi-Model
Approach
36 © DataStax, All Rights Reserved. Confidential
37. Using the Right Tool for the Problem
© DataStax, All Rights Reserved. Confidential37
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
38. Query: Who is this customer?
© DataStax, All Rights Reserved. Confidential38
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
Who is this
customer?
39. Query: What has this customer recently purchased?
© DataStax, All Rights Reserved. Confidential39
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer?
40. Query: Who is this customer related to?
© DataStax, All Rights Reserved. Confidential40
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer related to?
Who is this
customer?
41. Query: How influential is this customer?
© DataStax, All Rights Reserved. Confidential41
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer related to?
How influential is
this customer?
Who is this
customer?
42. Final Multi-Model Approach
© DataStax, All Rights Reserved. Confidential42
DSE
Core
DSE Analytics
DSE Search DSE Graph
Query Complexity
Simple Complex
Offline
Fast
Human
Fast
QueryLatency(p99)
What has
this
customer
recently
purchased?
Who is this
customer related to?
How influential is
this customer?
Who is this
customer?
44. DataStax Graph for Labs
© DataStax, All Rights Reserved. Confidential44
“Model Once”
Support
Because solving
complex graph
problems requires
more than just a
graph database.
Inherits DSE
Core Benefits
Fast, scalable and
highly available for
mission critical
applications on
prem and in the
cloud.
Built by the
Experts
Designed and
tested by the core
contributors to
Apache Cassandra
and Tinkerpop.