-
1.
Everything is not a graph problem.
Lessons from life in the trenches.
Denise Koessler Gosnell, Ph.D.
Senior Graph Consultant, DataStax
@denisekgosnell
-
2.
Theory vs. Reality?
© DataStax, All Rights Reserved.2
@denisekgosnell
-
3.
© DataStax, All Rights Reserved.3
Agenda
@denisekgosnell
-
4.
© DataStax, All Rights Reserved.4
The Graph
@denisekgosnell
-
5.
© DataStax, All Rights Reserved.5
The Graph
The vision
@denisekgosnell@denisekgosnell
person
service
device
product
device
addressstore
emailcredit
card
-
6.
© DataStax, All Rights Reserved.6
The Graph
Reality
@denisekgosnell
-
7.
What does your problem need?
@denisekgosnell@denisekgosnell
-
8.
• Across all of my data, how
many [___] are there?
• How many of [___] happened
in the past x amount of time?
• What most closely matches
[___]?
© DataStax, All Rights Reserved.8
@denisekgosnell
The starting point
The Graph
-
9.
© DataStax, All Rights Reserved.9
Goals?
@denisekgosnell
-
10.
© DataStax, All Rights Reserved.10
@denisekgosnell
Goals?
@denisekgosnell
-
11.
© DataStax, All Rights Reserved.11
Goals?
@denisekgosnell
-
12.
Not a graph problem.
@denisekgosnell@denisekgosnell
-
13.
© DataStax, All Rights Reserved.13
Status
@denisekgosnell
-
14.
You have graph data.
What do you plan on doing with it?
@denisekgosnell@denisekgosnell
-
15.
How many sessions are needed?
© DataStax, All Rights Reserved.15
-
16.
How many sessions are needed?
Brynn Lender is organizing a conference with 4 speakers:
Arya, Bran, Rob, and Sansa.
Logistic Issues:
1. Arya’s and Sansa’s conference talk requires the same
room, as does Bran’s and Rob’s.
2. Mr. Lender does not want to have Bran’s and Sansa’s
talk at the same time.
3. Arya and Bran are helping each other with their
presentations and need to attend each other’s talks.
© DataStax, All Rights Reserved.16
-
17.
We have graph data?
@denisekgosnell@denisekgosnell
-
18.
How many sessions are needed?
© DataStax, All Rights Reserved.18
-
19.
How many sessions are needed?
© DataStax, All Rights Reserved.19
@denisekgosnell
Bran
SansaArya
Rob
-
20.
How many sessions are needed?
© DataStax, All Rights Reserved.20
@denisekgosnell
Bran
SansaArya
Rob
-
21.
How many sessions are needed?
© DataStax, All Rights Reserved.21
@denisekgosnell
Bran
SansaArya
Rob
-
22.
How many sessions are needed?
© DataStax, All Rights Reserved.22
@denisekgosnell
Bran
SansaArya
Rob
-
23.
How many sessions are needed?
© DataStax, All Rights Reserved.23
@denisekgosnell
Bran
SansaArya
Rob
-
24.
How many sessions are needed?
© DataStax, All Rights Reserved.24
@denisekgosnell
Bran
SansaArya
Rob
-
25.
How many sessions are needed?
© DataStax, All Rights Reserved.25
@denisekgosnell
Bran
SansaArya
Rob
-
26.
How many sessions are needed?
© DataStax, All Rights Reserved.26
Session 1
Session 2
Session 3
@denisekgosnell
Bran
SansaArya
Rob
-
27.
Graph Analytics Problems.
@denisekgosnell@denisekgosnell
-
28.
• For all options, what is the best
route from city a to city b?
• What is a way to schedule
[____]?
• How do I place alarms in a
building to uniquely identify a
room according to alerts?
© DataStax, All Rights Reserved.28
Common Qs
@denisekgosnell
-
29.
• All pairs, shortest path
• Maximum clique
• Chromatic number
• Vertex degree distribution
• Page Rank
• …
© DataStax, All Rights Reserved.29
Common
Algorithms
@denisekgosnell
-
30.
© DataStax, All Rights Reserved.30
Status
@denisekgosnell
-
31.
Graph Database Problems.
@denisekgosnell@denisekgosnell
-
32.
@denisekgosnell@denisekgosnell
First: Graph Based Entity Resolution
-
33.
Telecom Social Networks
© DataStax, All Rights Reserved.33
@denisekgosnell
-
34.
Telecom Social Networks
© DataStax, All Rights Reserved.34
@denisekgosnell
-
35.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.35
@denisekgosnell
-
36.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.36
August 2017
@denisekgosnell
-
37.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.37
August 2017 September 2017
@denisekgosnell
-
38.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.38
August 2017 September 2017
@denisekgosnell
-
39.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.39
August 2017 September 2017
@denisekgosnell
-
40.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.40
August 2017 September 2017
@denisekgosnell
-
41.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.41
August 2017 September 2017
@denisekgosnell
-
42.
Telecom Social Fingerprinting
© DataStax, All Rights Reserved.42
@denisekgosnell
-
43.
@denisekgosnell@denisekgosnell
Second: Graph DB as a Master Identity Store
-
44.
© DataStax, All Rights Reserved.44
The Graph
@denisekgosnell
We’re back.
@denisekgosnell
person
service
device
product
device
addressstore
emailcredit
card
-
45.
© DataStax, All Rights Reserved.45
Good?
@denisekgosnell
person person
person
-
46.
@denisekgosnell@denisekgosnell
What are you trying to read from your database?
-
47.
© DataStax, All Rights Reserved.47
Better.
@denisekgosnell@denisekgosnell
person
master_uuid
person
master_uuid
address
device payment
device
flight
-
48.
© DataStax, All Rights Reserved.48
Best?
@denisekgosnell
provenance provenance provenance provenance
person
master_uuid
person
master_uuid
address
device payment
device
flight
-
49.
© DataStax, All Rights Reserved.49
Status:
@denisekgosnell
-
50.
© DataStax, All Rights Reserved.50
Status:
@denisekgosnell
-
51.
Everything is not a graph problem.
(but there are plenty)
Denise Koessler Gosnell, Ph.D.
Senior Graph Consultant, DataStax
@denisekgosnell
-
52.
• GitHub:
• github.com/datastax/graph-examples
• @DeniseKGosnell
• Twitter: @DeniseKGosnell
• Email: Denise.Gosnell@datastax.com
© DataStax, All Rights Reserved.52
Contact
What I hope you get out of this talk: Life in the trenches
Target audience: architects, PMs, CTOs.
2017 was the year of the graph, what have we learned?
This talk:
Life in the trenches
What problems have gone wrong
How I classify ”graph problems”
About me…
Describe:
non-graph problems,
graph analytics problems,
graph database problems
Include:
-- overlooked architectural issue
-- recommendation
Has anyone in here been tasked with building this? The new singular system that will have it all?
Story #1: This is my story of failure: tasked to build the one graph that ruled them all.
The world makes so much sense like this. Obviously a graph.
Grand vision: “the one graph that rules it all”
-- overlooked architectural issue – these are the types of questions I was being asked.
-- recommendation
Entity resolution:
---> workload doesn’t match the data model
-- huge swiss army knife and only opening cans of beans. (graph for bar charts) got a graph, only need bar charts
- needed the simple and we had the powerful
---> I lost site of what my manager really needed
Is this really what you problem needs?
Remember the one graph that I had to build? At the end of the day, my c-suite wanted this.
Is this really what you problem needs?
Is this really what you problem needs?
Essentially –
We were letting the process of “how we think about the data”
Drive the
Actual implementation details.
my ask:
use the right tool for the right problem
Ok. You have graph data. But do you need to analyze it or query it?
Let’s do a ”for – instance”
Essentially --
Query the audience:
4 sessions?
3?
2?
Graph problem?
Build a conflict graph
Arya’s and Sansa’s conference talk requires the same room, as does Bran’s and Rob’s.
Mr. Lender does not want to have Bran’s and Sansa’s talk at the same time.
Arya and Bran are helping each other with their presentations and need to attend each other’s talks.
… WTF where are you going with this Denise??
Arya and Bran are helping each other with their presentations and need to attend each other’s talks.
… WTF where are you going with this Denise??
Describe:
non-graph problems,
graph analytics problems,
graph database problems
Include:
-- overlooked architectural issue
-- recommendation
Describe:
non-graph problems,
graph analytics problems,
graph database problems
Include:
-- overlooked architectural issue
-- recommendation
This duality exists with any problem – not just graph problems: algorithmy vs storage/retrivaly
Describe:
non-graph problems,
graph analytics problems,
graph database problems
Include:
-- overlooked architectural issue
-- recommendation
What are the warning signs that showed us that you were going down the wrong path?
--- aka --- what to look for to know that you are about to get fired before you get fired
trade offs:
--- all in 1 system and have bad SLAs
Or
-– data copied in multiple places in the system and the complexity of (1) maintaining data duplication and (2) properly routing queries to their clever data model for performant answers
Life in the trenches
Entity resolution:
---> workload doesn’t match the data model
-- huge swiss army knife and only opening cans of beans. (graph for bar charts) got a graph, only need bar charts
- needed the simple and we had the powerful
---> I lost site of what my manager really needed
Entity resolution with a mysql database:
- I answered today’s question, but didn’t look far enough down the road to answer the next question
I’ve got a hammer, and I am hammering in nails –
ok, now remove the nails
Now open the can of beans with the hammer
Is there a situation in which graph databases have been useful for resolving entities?
2 examples
Is there a situation in which graph databases have been useful for resolving entities?
2 examples
First story: 2012
The churn problem: a special version of ER problem.
After trying SVMs, ANNs, linear models.. You name it – this is what we arrived at
Sub graph from time t – 1. (aka last month_)
Induced subgraph for time t
Induced subgraph for time t
Induced subgraph for time t
Induced subgraph for time t
This is the identity which in production we would label with confidence as being associated to the identity from August
How good was this? See red, you are dead.
Is there a situation in which graph databases have been useful for resolving entities?
2 examples
How do we use a graph database to make something like this happen?
We don’t get to skip that step just because we are in a graph db anymore
WHY the blue vertices?
WHAT becomes a property? ---> WHAT ARE YOU NEEDING TO READ FROM. YOUR DB?
We don’t get to skip that step anymore.
1 QUESTION: WHAT DO YOU WANT OUT
What is best? It all depends. Vertices typically become things that require edges between master identities
Before you hammer away at the key board, do some planning. What do you need back out? That is what will define a solid graph architecture for your application.
This duality exists with any problem – not just graph problems: algorithmy vs storage/retrivaly
trade offs:
--- all in 1 system and have bad SLAs
Or
-– data copied in multiple places in the system and the complexity of (1) maintaining data duplication and (2) properly routing queries to their clever data model for performant answers
This duality exists with any problem – not just graph problems: algorithmy vs storage/retrivaly
trade offs:
--- all in 1 system and have bad SLAs
Or
-– data copied in multiple places in the system and the complexity of (1) maintaining data duplication and (2) properly routing queries to their clever data model for performant answers
if you take anything away,
it is that the graph community as a whole is at a critical point.
if we continue to approach graphs a "CAN SOLVE EVERYTHING", we need to reconsider. This could have a wave of ramifications for the whole community
Use the right tool for the right problem.