Successfully reported this slideshow.

2

Share

1 of 52
1 of 52

# Everything is not a graph problem. But, there are plenty.

2

Share

This talk was given on Jan. 27th in Austin, TX at the Data Day 2018 conference.

This talk was given on Jan. 27th in Austin, TX at the Data Day 2018 conference.

## More Related Content

### Related Books

Free with a 14 day trial from Scribd

See all

### Related Audiobooks

Free with a 14 day trial from Scribd

See all

### Everything is not a graph problem. But, there are plenty.

1. 1. Everything is not a graph problem. Lessons from life in the trenches. Denise Koessler Gosnell, Ph.D. Senior Graph Consultant, DataStax @denisekgosnell
7. 7. What does your problem need? @denisekgosnell@denisekgosnell
8. 8. • Across all of my data, how many [___] are there? • How many of [___] happened in the past x amount of time? • What most closely matches [___]? © DataStax, All Rights Reserved.8 @denisekgosnell The starting point The Graph
12. 12. Not a graph problem. @denisekgosnell@denisekgosnell
14. 14. You have graph data. What do you plan on doing with it? @denisekgosnell@denisekgosnell
16. 16. How many sessions are needed? Brynn Lender is organizing a conference with 4 speakers: Arya, Bran, Rob, and Sansa. Logistic Issues: 1. Arya’s and Sansa’s conference talk requires the same room, as does Bran’s and Rob’s. 2. Mr. Lender does not want to have Bran’s and Sansa’s talk at the same time. 3. Arya and Bran are helping each other with their presentations and need to attend each other’s talks. © DataStax, All Rights Reserved.16
17. 17. We have graph data? @denisekgosnell@denisekgosnell
27. 27. Graph Analytics Problems. @denisekgosnell@denisekgosnell
28. 28. • For all options, what is the best route from city a to city b? • What is a way to schedule [____]? • How do I place alarms in a building to uniquely identify a room according to alerts? © DataStax, All Rights Reserved.28 Common Qs @denisekgosnell
29. 29. • All pairs, shortest path • Maximum clique • Chromatic number • Vertex degree distribution • Page Rank • … © DataStax, All Rights Reserved.29 Common Algorithms @denisekgosnell
31. 31. Graph Database Problems. @denisekgosnell@denisekgosnell
32. 32. @denisekgosnell@denisekgosnell First: Graph Based Entity Resolution
43. 43. @denisekgosnell@denisekgosnell Second: Graph DB as a Master Identity Store
46. 46. @denisekgosnell@denisekgosnell What are you trying to read from your database?
51. 51. Everything is not a graph problem. (but there are plenty) Denise Koessler Gosnell, Ph.D. Senior Graph Consultant, DataStax @denisekgosnell

### Editor's Notes

• What I hope you get out of this talk: Life in the trenches

Target audience: architects, PMs, CTOs.

2017 was the year of the graph, what have we learned?

• This talk:
Life in the trenches
What problems have gone wrong
How I classify ”graph problems”

• Describe:
non-graph problems,
graph analytics problems,
graph database problems

Include:
-- overlooked architectural issue
-- recommendation
• Has anyone in here been tasked with building this? The new singular system that will have it all?

Story #1: This is my story of failure: tasked to build the one graph that ruled them all.
• The world makes so much sense like this. Obviously a graph.

Grand vision: “the one graph that rules it all”

• -- overlooked architectural issue – these are the types of questions I was being asked.
-- recommendation

Entity resolution:
---> workload doesn’t match the data model
-- huge swiss army knife and only opening cans of beans. (graph for bar charts) got a graph, only need bar charts
- needed the simple and we had the powerful
---> I lost site of what my manager really needed
• Is this really what you problem needs?

Remember the one graph that I had to build? At the end of the day, my c-suite wanted this.
• Is this really what you problem needs?
• Is this really what you problem needs?
• Essentially –

We were letting the process of “how we think about the data”
Drive the
Actual implementation details.

use the right tool for the right problem
• Ok. You have graph data. But do you need to analyze it or query it?
• Let’s do a ”for – instance”
• Essentially --
• Query the audience:
4 sessions?
3?
2?

Graph problem?
• Build a conflict graph
• Arya’s and Sansa’s conference talk requires the same room, as does Bran’s and Rob’s.
• Mr. Lender does not want to have Bran’s and Sansa’s talk at the same time.
• Arya and Bran are helping each other with their presentations and need to attend each other’s talks.

… WTF where are you going with this Denise??
• Arya and Bran are helping each other with their presentations and need to attend each other’s talks.

… WTF where are you going with this Denise??
• Describe:
non-graph problems,
graph analytics problems,
graph database problems

Include:
-- overlooked architectural issue
-- recommendation
• Describe:
non-graph problems,
graph analytics problems,
graph database problems

Include:
-- overlooked architectural issue
-- recommendation
• This duality exists with any problem – not just graph problems: algorithmy vs storage/retrivaly

Describe:
non-graph problems,
graph analytics problems,
graph database problems

Include:
-- overlooked architectural issue
-- recommendation

What are the warning signs that showed us that you were going down the wrong path?
--- aka --- what to look for to know that you are about to get fired before you get fired

--- all in 1 system and have bad SLAs
Or
-– data copied in multiple places in the system and the complexity of (1) maintaining data duplication and (2) properly routing queries to their clever data model for performant answers

Life in the trenches

Entity resolution:
---> workload doesn’t match the data model
-- huge swiss army knife and only opening cans of beans. (graph for bar charts) got a graph, only need bar charts
- needed the simple and we had the powerful
---> I lost site of what my manager really needed

Entity resolution with a mysql database:
- I answered today’s question, but didn’t look far enough down the road to answer the next question
I’ve got a hammer, and I am hammering in nails –
ok, now remove the nails
Now open the can of beans with the hammer
• Is there a situation in which graph databases have been useful for resolving entities?

2 examples
• Is there a situation in which graph databases have been useful for resolving entities?

2 examples
• First story: 2012
• The churn problem: a special version of ER problem.

After trying SVMs, ANNs, linear models.. You name it – this is what we arrived at
• Sub graph from time t – 1. (aka last month_)
• Induced subgraph for time t
• Induced subgraph for time t
• Induced subgraph for time t
• Induced subgraph for time t
• This is the identity which in production we would label with confidence as being associated to the identity from August
• Is there a situation in which graph databases have been useful for resolving entities?

2 examples
• How do we use a graph database to make something like this happen?
• We don’t get to skip that step just because we are in a graph db anymore
• WHY the blue vertices?

WHAT becomes a property? ---> WHAT ARE YOU NEEDING TO READ FROM. YOUR DB?
We don’t get to skip that step anymore.

1 QUESTION: WHAT DO YOU WANT OUT

• What is best? It all depends. Vertices typically become things that require edges between master identities

Before you hammer away at the key board, do some planning. What do you need back out? That is what will define a solid graph architecture for your application.
• This duality exists with any problem – not just graph problems: algorithmy vs storage/retrivaly

--- all in 1 system and have bad SLAs
Or
-– data copied in multiple places in the system and the complexity of (1) maintaining data duplication and (2) properly routing queries to their clever data model for performant answers
• This duality exists with any problem – not just graph problems: algorithmy vs storage/retrivaly