COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
Democratizing Health Analytics & Data
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
Building a Graph Database
in Mongo
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
3
Agenda
• What is a Graph Database?
• Let’s Build One!
• Example MongoDb Implementation
• Pitfalls
• How Apervita Uses Graphs
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
4
Graph Databases
Elements:
Edges
Nodes
Nodes and edges can also have properties
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
5
More on Graphs
• Graphs can have multiple types of edges
• Multiple types of graphs
• Directed Acyclic Graphs are our focus
• Directed have one-way relationships
• Relationships will not loop back to a higher node
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
6
Let’s build a Graf Database!
This is Steffi Graf
• Edges:
• Tournament Type
• Sponsorships
• Tournaments Won
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
7
Gran
d
Slam
French
Open
US
Open
Aus.
Open
Cano
n
Dunlo
p
Head
Adida
s
Wimbl
edon
Tourney Won
Tourney Type
Sponsorship
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
8
Graf is a Subgraph
• Implementations depend on what the overall purpose
of the graph is
• Tracking tournaments and results?
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
9
Graf is a Subgraph
• Tracking tennis players and their results?
Tournaments
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
10
Edges
Graf is a Subgraph
• Tracking tennis players and their results?
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
11
Players
Graf is a Subgraph
• Tracking tennis players and their results?
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
12
Graf is a Subgraph
• Just tracking Stefi Graf?
You
Stalkers
• You’ll need a new relationship:
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
13
Implementation of paths
•Multiple implementations of the graph exist:
•Parents only
•Children only
•Parents and Children
•Edges
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
14
Transitive Closure Mathematics
• Finding transitive closure with math
• Picard solved this in 1976. (not that Picard)
• But it’s polynomial time and the math is kind of hard
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
15
Implementing Paths
• Solution:
• Implement paths in their own collection.
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
16
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
17
Considerations
• As graphs get larger, performance becomes an
issue
• Extremely wide child models can break the
document size
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
18
How Apervita Uses Graphs
• Hospital Datasets are dynamic and varied
• Medical Vocabularies are by their nature graphs
• SNOMED, ICD-9, ICD-10, RxNorm
• The ability to code an algorithm to the “generic” and
subsume for the specific is extremely powerful
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
19
How Apervita Uses Graphs
Example of browsing RxNorm
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
20
In Summary
• Think about your data model
• Use MongoDb’s indexing power to have quick
access to your paths for graph calculations
• Think about your overall expected graph size
COPYRIGHT © 2015. THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC.
Democratizing Health Analytics & Data

MongoDB Days Silicon Valley: Implementing Graph Databases with MongoDB

  • 1.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. Democratizing Health Analytics & Data
  • 2.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. Building a Graph Database in Mongo
  • 3.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 3 Agenda • What is a Graph Database? • Let’s Build One! • Example MongoDb Implementation • Pitfalls • How Apervita Uses Graphs
  • 4.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 4 Graph Databases Elements: Edges Nodes Nodes and edges can also have properties
  • 5.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 5 More on Graphs • Graphs can have multiple types of edges • Multiple types of graphs • Directed Acyclic Graphs are our focus • Directed have one-way relationships • Relationships will not loop back to a higher node
  • 6.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 6 Let’s build a Graf Database! This is Steffi Graf • Edges: • Tournament Type • Sponsorships • Tournaments Won
  • 7.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 7 Gran d Slam French Open US Open Aus. Open Cano n Dunlo p Head Adida s Wimbl edon Tourney Won Tourney Type Sponsorship
  • 8.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 8 Graf is a Subgraph • Implementations depend on what the overall purpose of the graph is • Tracking tournaments and results?
  • 9.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 9 Graf is a Subgraph • Tracking tennis players and their results? Tournaments
  • 10.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 10 Edges Graf is a Subgraph • Tracking tennis players and their results?
  • 11.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 11 Players Graf is a Subgraph • Tracking tennis players and their results?
  • 12.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 12 Graf is a Subgraph • Just tracking Stefi Graf? You Stalkers • You’ll need a new relationship:
  • 13.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 13 Implementation of paths •Multiple implementations of the graph exist: •Parents only •Children only •Parents and Children •Edges
  • 14.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 14 Transitive Closure Mathematics • Finding transitive closure with math • Picard solved this in 1976. (not that Picard) • But it’s polynomial time and the math is kind of hard
  • 15.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 15 Implementing Paths • Solution: • Implement paths in their own collection.
  • 16.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 16
  • 17.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 17 Considerations • As graphs get larger, performance becomes an issue • Extremely wide child models can break the document size
  • 18.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 18 How Apervita Uses Graphs • Hospital Datasets are dynamic and varied • Medical Vocabularies are by their nature graphs • SNOMED, ICD-9, ICD-10, RxNorm • The ability to code an algorithm to the “generic” and subsume for the specific is extremely powerful
  • 19.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 19 How Apervita Uses Graphs Example of browsing RxNorm
  • 20.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. 20 In Summary • Think about your data model • Use MongoDb’s indexing power to have quick access to your paths for graph calculations • Think about your overall expected graph size
  • 21.
    COPYRIGHT © 2015.THIS INFORMATION IS THE PROPERTY OF APERVITA, INC. APERVITA IS A REGISTERED TRADEMARK OF APERVITA, INC. Democratizing Health Analytics & Data

Editor's Notes

  • #5 e.g. distance between two cities (edge) population of city (node)
  • #7 Steffi held the #1 rank for 377 weeks total, longer than any tennis player male or female She’s won 22 Grand Slam singles titles, which is the most held by any player male or female since the Open Era started in 1968 She’s won a Golden Slam, winning all 4 grand slam tournaments AND the olympic gold medal in a single year, and is the only player in the history of tennis to do so And her name is a homonym for graph, which is how I ended up learning all the previous facts
  • #9 All tournaments per year should get their own node Participants, winners, scores
  • #10 Tracking tennis players and their results? Tournaments may be independent nodes, one per tournament Property of Years Won can reside on the edge (as this is per player
  • #13 Tournaments can actually have the property of years won
  • #14 All of these have one major issue: walking the graph is computing intensive and results in many queries to traverse the graph
  • #16 One path list per relationship/unique path Indexing the path array leads to speedy results This will allow (with some post-processing): Transitive Closure Reachability becomes a trivial query query example
  • #18  As graphs get larger, performance becomes an issue RXNORM 204,000 SNOMED 311,000 ICD10 PCS 76000 Sharding on edge id + graph id Collection per graph id Extremely wide child models can break the document size Implementing pseudo subnodes can alleviate this pressure
  • #19 Apervita allows medical authors to code to common datasets Hospitals aren’t forced to conform their data, but can take advantage of our mapping tools to execute algorithms on their own data Medical Vocabularies are by their nature graphs SNOMED, ICD-9, ICD-10, RxNorm The ability to code an algorithm to the “generic” and subsume for the specific is extremely powerful e.g. Coding to Diabetes Mellitus (SNOMED ID 73211009) can capture all subnodes such as gestational diabetes, diabetes type II, diabetes type I
  • #21 Think about your data model What are your nodes? What are your edges? What should be a property vs a node? Use MongoDb’s indexing power to have quick access to your paths for graph calculations Think about your overall expected graph size Does the width imply faux nodes? Should you shard? If multiple subgraphs, should you separate into individual collections?