1) Graph databases can represent data and relationships in ways that are more natural than relational databases.
2) However, different graph database implementations use different internal representations and APIs, making it hard to optimize queries across systems or build applications that work with multiple databases.
3) The property graph model has emerged as a common standard that represents graphs with nodes, relationships, and key-value attributes on both nodes and relationships.
1. Connections to the Real World
Graph Databases and Applications
Achim Friedland <achim@graph-database.org>, Aperis GmbH 1st University-Industrial Meeting on Graph Databases - 7.-8. Feb.. 2011, Barcelona , Spain
3. Welcome on the customer side... ;)
www.graph-database.org
3
4. The Graph Representation Problem
Adjacency matrix vs. Incidence matrix vs.
Adjacency list vs. Edge list vs. Classes,
Index-based vs. Index-free Adjacency, Dense
vs. Sparse graphs, On-disc vs. In-memory
graphs, All-Indexed vs. Specific-Index-
Creation, directed vs. undirected edges,
hypergraphs?, hierarchical graphs?, dynamic
graphs?
• Different levels of expressivity
• Sometimes very application specific
• Hard to optimize a single one for every use-case
4
5. The GraphDB Vendor Problem
• Multiple APIs from different vendors
• Unknown internal graph representation
• Unclear design goals
• Community involvement?
5
7. The Property-Graph Model
The most common graph model within
the NoSQL GraphDB space
edge label
Id: 1 Id: 2
Friends
name: Alice name: Bob
since: 2009/09/21
age: 21 age: 23
edge
vertex
properties
properties
• directed: Each edge has a source and destination vertex
• attributed: Vertices and edges carry key/value pairs
• edge-labeled: The label denotes the type of relationship
• multi-graph: Multiple edges between any two vertices allowed
7
9. A Property Graph Model Interface for Java and .NET
// Use a class-based in-memory graph
var graph = new InMemoryGraph();
var v1 = graph.AddVertex(new VertexId(1));
var v2 = graph.AddVertex(new VertexId(2));
v1.SetProperty("name", "Alice");
v1.SetProperty("age" , 21);
v2.SetProperty("name", "Bob");
v2.SetProperty("age" , 23);
var e1 = graph.AddEdge(v1, v2, new EdgeId(1), "Friends");
e1.SetProperty(“since”, ”2009/09/21”);
structured data (XML, JSON)
9
12. Querying a Graph Database
• Programmatic / API
• From any programming language, Pipes, ...
• Synchronous or Asynchronous
• Allow bypassing all optimizations
• Do not try to be smarter than the application
developer
• Ad hoc / Explorative
• Gremlin aka. “high-level pipes”?
• sones GQL, OrientDB QL aka. “SQL style”?
• Pattern matching aka. “SPARQL style”?
• Easy embedding of domain specific query languages?
12
13. A data flow framework for property graph models
: IEnumerator<E>, IEnumerable<E>
S ISideEffectPipe<in S, out E, out T> E
Source Emitted
Elements T Elements
Side Effect
13
14. Create complex pipes by combining pipes to pipelines
S Pipeline<S, E> E
pipe1<S,A> pipe2<B,C> pipe3<C,E>
Source Emitted
Elements Elements
14
15. A “perl”-style Ad Hoc query language for graphs
// Friends-of-a-friend
var pipe1 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
var pipe2 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);
var pipe3 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
var pipe4 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
var pipe5 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);
var pipe6 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
var pipe7 = new PropertyPipe("name");
var pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6,pipe7);
pipeline.SetSource(new SingleEnumerator(
graph.GetVertex(new VertexId(1))));
g:id-v(1)/outE[@label='Friends']/inV/outE
[@label='Friends']/inV/@name
15
16. sones GQL
A “SQL”-style Ad Hoc query language for graphs
// Friends-of-a-friend
var pipe1 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
var pipe2 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);
var pipe3 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
var pipe4 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
var pipe5 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);
var pipe6 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
var pipe7 = new PropertyPipe("name");
var pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6,pipe7);
pipeline.SetSource(new SingleEnumerator(
graph.GetVertex(new VertexId(1))));
From User u SELECT u.Friends.Friends.name
WHERE u.Id = 1
16
18. Query Result Formats
• Graphs
• QR may be queried over and over again
• QR may be stored/cached as a graph
• But again: (Too) may graph representations available
• Other data structures
• If result is just a list, why converting it to a graph?
• Simple for programming languages
• Much more complicated for Query Languages
18
19. Query Result Formats
• Reduced 2-tier architecture (GraphDB -> Client)
• Higher performance
• Avoids relational architecture anti-patterns
• Link-aware, self-describing hypermedia (see Neo4J)
• e.g. ATOM, XML + XLINK, RDFa
• User-defined/application specific protocols
• E.g. serve HTML/GEXF directly (see CouchDB)
• Allows to create powerful embedded applications
19
21. A HTTP/REST interface for property graphs
• rexster server
• Exposes a graph via HTTP/REST
• Vertices and edges are REST resources
• Neo4J, OrientDB are available,
InfiniteGraph announced
• rexster client
• Accessing remote graphs
21
27. The GraphDB Graph...
OrientDB for Documents ThinkerGraph &
Gremlin for Ad Hoc
InfiniteGraph for Neo4J for GIS
Clustering
InfoGrid for WebApps In-Memory for
Caching
OrientDB for Ad Hoc
Neo4J for HA
27