Graphdatabases

GRAPH DATABASES
Henning Rauch
1

AGENDA

• Overview

• Neo4J

• InﬁniteGraph

• Fallen-8

2

OVERVIEW
Why it makes sense to know about graph databases

„Graph databases will come into vogue. One key gap in the Hadoop
ecosystem is for graph databases, which support rich mining and visualization of
relationships, influence, and behavioral propensities. The market for graph
databases will boom in 2012 as companies everywhere adopt them for social
media analytics, marketing campaign optimization, and customer experience
fine-tuning. We will see VCs put big money behind graph database and analytics
startups. Many big data platform and tool vendors will acquire the startups to
supplement their expanding Hadoop, NoSQL, and enterprise data warehousing
(EDW) portfolios. Social graph analysis, although not a brand-new field, will
become one of the most prestigious specialties in the data science arena,
focusing on high-powered drilldown into polystructured behavioral data sets.“
Source: http://blogs.forrester.com/james_kobielus/11-12-19-the_year_ahead_in_big_data_big_cool_new_stuff_looms_large

3

OVERVIEW
Example of a real-world graph - facebook

Source: http://www.facebook.com/press/info.php?statistics

4

OVERVIEW
Example of a real-world graph - NYT „Cascade“

Source: http://nytlabs.com/projects/cascade.html

5

OVERVIEW
Example of a real-world graph - phone bill

6

OVERVIEW
Delimitation to RDBMS - property graph
RDBMS GraphDB

7

OVERVIEW
RDBMS GraphDB
Person
Id Name
0 Henning Rauch
1 René Peinl
2 Foo Bar
3 Bruce Schneier
4 Linus Torwalds

7

OVERVIEW
RDBMS GraphDB
Person
2
Id Name 3
0 Henning Rauch
1 René Peinl
2 Foo Bar
3 Bruce Schneier
4
4 Linus Torwalds
1

0

7

OVERVIEW
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch 1 0
1 René Peinl 1 2
2 Foo Bar 1 3
3 Bruce Schneier 1 4
4
4 Linus Torwalds 0 1 1
0 2
0 3
0 4
3 4
4 3 0

7

OVERVIEW
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch 1 0
1 René Peinl 1 2
2 Foo Bar 1 3
4
0 2
0 3
0 4
3 4
4 3 0

Tag
Id Name
0 .NET
1 Java
2 PKI
3 NoSQL

7

OVERVIEW
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
0 Henning Rauch Java
1 0
1 René Peinl 1 2
2 Foo Bar 1 3
4
0 2
0 3
0 4
3 4
4 3 0

Tag NoSQL
Id Name .NET

0 .NET
PKI
1 Java
2 PKI
3 NoSQL

7

OVERVIEW
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
1 0
1 René Peinl 1 2
2 Foo Bar 1 3
4
0 2
0 3
0 4
3 4
4 3 0

Tag Tags_rel NoSQL
Id Name Tag_Id Person_Id Signiﬁcance .NET

0 .NET 0 0 5
PKI
1 Java 1 1 5
2 PKI 2 1 6
3 NoSQL 2 3 10
3 0 7
3 1 7
7

OVERVIEW
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
1 0
1 René Peinl 1 2
10
2 Foo Bar 1 3
5
4
0 2
0 3
0 4
3 4 7

4 3 0

5
Tag Tags_rel NoSQL 7

0 .NET 0 0 5
6 PKI
1 Java 1 1 5
2 PKI 2 1 6
3 NoSQL 2 3 10
3 0 7
3 1 7
7

OVERVIEW
RDBMS GraphDB
Person Knows_rel
2
Id Name Id_1 Id_2 3
1 0
1 René Peinl 1 2
10
2 Foo Bar 1 3
5
4
0 2
0 3
0 4
3 4 7

4 3 0

5
Tag Tags_rel NoSQL 7

0 .NET 0 0 5
6 PKI
1 Java 1 1 5
2 PKI 2 1 6
3 NoSQL 2
3
3
0
10
7
vertex =
3 1 7
7 properties + edges

OVERVIEW
Delimitation to RDBMS - Scalability
Knows_rel
Id_1 Id_2

• Relation tables act as a global index over linked 1
1
0
2

data 1
1
3
4
0 1

The bigger the relation table the longer it takes to
0 2
• 0 3

get the interesting information (e.g. local 0
3
4
4

neighbourhood of data) 4 3

Tags_rel

• Solution of graph databases: Information on Tag_Id
0
Person_Id
0
Signiﬁcance
5

relationships (aka edges) are stored locally on the 1 1 5

vertex
2 1 6
2 3 10
3 0 7
3 1 7

8

OVERVIEW
Delimitation to RDBMS - example of complexity

• Task: Find the persons that are known to Id 0.
Knows_rel

• Linear table scan: O(n) Id_1
1
Id_2
0
1 2

Index scan: O(log n)
1 3
• 1 4
0 1
0 2

• Because of the dependency to n RDBMS do not 0
0
3
4

perform well on recursive search algorithms 3 4
4 3

• Graph database solve this task in O(1)

9

OVERVIEW
Delimitation to other NoSQL products
Size

> 90% of use cases

Complexity

Source: http://www.slideshare.net/jexp/neo4j-graph-database-presentation-german

10

OVERVIEW
Size
Key/Value
stores

> 90% of use cases

Complexity


10

OVERVIEW
Size
Key/Value
stores
Bigtable
clones

> 90% of use cases

Complexity


10

OVERVIEW
Size
Key/Value
stores
Bigtable
clones
Document
databases

> 90% of use cases

Complexity


10

OVERVIEW
Size
Key/Value
stores
Bigtable
clones
Document
databases

Graph databases

> 90% of use cases

Complexity


10

OVERVIEW
Size
Key/Value
stores
Bigtable
clones
Document
databases

Graph databases
In-memory
graph databases

> 90% of use cases

Complexity


10

OVERVIEW
Graph processing vs. graph database

OLAP Graph afﬁne
Universal OLTP

Quelle: http://jim.webber.name/2011/08/24/66f1fb4b-83c3-4f52-af40-ee6382ad2155.aspx

11

OVERVIEW

OLAP Graph afﬁne OLTP

RDBMS
Universal


11

OVERVIEW

OLAP Graph afﬁne OLTP

Hadoop

RDBMS
Universal


11

OVERVIEW

Graph afﬁne
Pregel

OLAP OLTP

Hadoop

RDBMS
Universal


11

OVERVIEW

Graph afﬁne
Graph databases

Pregel

OLAP OLTP

Hadoop

RDBMS
Universal


11

OVERVIEW

Graph afﬁne
In-memory
Graph databases
graph database

Pregel

OLAP OLTP

Hadoop

RDBMS
Universal


11

OVERVIEW
Graph databases

• Neo4J • HypergraphDB
• InﬁniteGraph (Objectivity) • DEX
• Sones GraphDB • FlockDB (Twitter)
• AllegroGraph • Trinity (Microsoft)
• OrientDB • Fallen-8


12

NEO4J
Overview

• Graph database + Lucene index

• ACID (isolation level read committed)

• High availability in enterprise edition

• 32 billion vertices, 32 billion edges, 64 billion properties

• Embedded or via REST-API

• Support for the Blueprints project
14

NEO4J
Architecture

Cypher/Gremlin Java/Ruby/.../C# API

REST API

Core API (Java)

Caches (ﬁles and objects) HA

Record ﬁles Transaction-log

Disk(s)

Source: http://www.slideshare.net/rheehot/eo4j-12713065

15

NEO4J
knows
Example of the on-disk layout
Name: Bob
Age: 42

Name: Alice
Age: 23
knows

knows

Name: Carol
Age: 22

Source: https://github.com/thobe/presentations

16

NEO4J
knows
Name: Bob
Age: 42

Name
Name: Alice Bob
Name Age: 23
Alice knows
Age
42
Age
23
knows
Name
Carol
Name: Carol
Age: 22
Age
22

16

NEO4J
knows
SP EP Name: Bob
SN EN Age: 42
knows

Name
Name: Alice SP EP Bob
Name Age: 23
SP EP SN EN
Alice knows
SN EN knows Age
knows 42
Age
23
knows
Name
SP Source Previous
Carol
SN Source Next Name: Carol
EP End Previous Age: 22
EN End Next Age
22
Existent
Nonexistent Source: https://github.com/thobe/presentations

16

NEO4J
In-memory layout (cache)
ID
Relationship ID refs
in: R1 R2 ... Rn
Type 1
out R1 R2 ... Rn

Vertex ... Grouped by type (type = „knows“)
• Transformation of the
double linked list (on-disk)
in: R1 R2 ... Rn
Type n
out R1 R2 ... Rn
to objects
Key 1 Key 2 ... Key n
• Increases the traversal
Val 1

Val 2

Val n
speed
ID start end type
Edge
Key 1 Key 2 ... Key n
Val 1

Val 2

Val n


17

NEO4J
Traversal

• Relationship-expander (delivers edges of a vertex)

• Evaluators (evaluate if a vertex is going to be traversed or if it
should be taken to the result set)

• Projection of the result set (e.g. „take the last vertex of the path“

• Uniqueness level (sets in steps, whether a node could be visited
several times)

18

NEO4J
Cypher & Gremlin
Feature Gremlin Cypher

Paradigm Imperative programming Declarative programming

•Developed Marko Rodriguez (Tinkerpop) •In-house development
Description • •Cypher provides greater opportunities for optimization
Based on xpath to describe the traversal
•Developed using Groovy •Good for traversals that need back tracking
•30-50% faster on „simple“ traversals •Output is a table
START
me=node:people(name={myname})
MATCH
me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item
outE[label=HAS_CART].inV item<-[:PURCHASED]-user-[:PURCHASED]->recommendation
.outE[label=CONTAINS_ITEM].inV RETURN recommendation
Example
.inE[label=PURCHASED].outV
.outE[label=PURCHASED].inV START
d=node(1), e=node(2)
MATCH
p = shortestPath( d-[*..15]->e )
RETURN p

19

NEO4J
WebAdmin

Quelle: http://docs.neo4j.org/chunked/stable/images/operations/webadmin-overview.png

20

NEO4J
Pricing

Price
Edition License Description
(annual)

Complete database
Open Source
„Community“ including a basic 0 €
(GPLv3)
management frontend

+
Monitoring, better
„Advanced“ Commercial and AGPL 6,000 €
management frontend and
support

+
„Enterprise“ Commercial and AGPL Enterprise frontend, HA and 24,000 €
premium support

21

INFINITEGRAPH
Overview

• Distributed graph database

• Implemented in C++ (APIs in Java, C#, Python, etc.)

• Based on Objectivity/DB (distributed object database)

• Established 1988 in Sunnyvale, California

• Enterprise-customers + US-government

• Support for Blueprints
23

INFINITEGRAPH
Architecture
User Apps
Blueprints#

IG#Core/API#

Management# Naviga0on# Session#/#TX#
Placement# Conﬁgura0on#
Extensions# Execu0on# Management#

Objec0vity/DB#Distributed#Database#

Copyright © InﬁniteGraph

24

INFINITEGRAPH
Ingest
AppD2#
AppD1# AppD3#
(Ingest#V2)#
(Ingest#V1)# (Ingest#V3)#
#

IG#Core/API#

Standard#Blocking#Ingest/Placement#(MDP#Plugin)#
Objec@vity/DB#


25

INFINITEGRAPH
Ingest
AppD2#
AppD1# AppD3#
(Ingest#V2)#
(Ingest#V1)# (Ingest#V3)#
#

IG#Core/API#

Objec@vity/DB#

V1# V2# V3#


25

INFINITEGRAPH
Ingest
AppD2#
App#1%
AppD1# App#2% App#3%
AppD3#
(Ingest#V2)#
(Ingest#V )#
(E1%2{%V1V21})% (E23{%V2V3})% (Ingest#V3)#
%
#

IG#Core/API#

Objec@vity/DB#

V1# V2# V3#


25

INFINITEGRAPH
Ingest
AppD2#
App#1%
AppD1# App#2% App#3%
AppD3#
(Ingest#V2)#
(Ingest#V )#
(E1%2{%V1V21})% (E23{%V2V3})% (Ingest#V3)#
%
#

IG#Core/API#

Objec@vity/DB#

V1# E12$ V2# E23$ V3#


25

INFINITEGRAPH
Code (ingest)

Vertex alice = myGraph.addVertex(new Person(“Alice”));
Vertex bob = myGraph.addVertex(new Person(“Bob”));
Vertex carlos = myGraph.addVertex(new Person(“Carlos”));
Vertex charlie = myGraph.addVertex(new Person(“Charlie”));

alice.addEdge(new Meeting(“Denver”, “5-27-10”), bob);
bob.addEdge(new Call(timestamp), carlos);
carlos.addEdge(new Payment(100000.00), charlie);
bob.addEdge(new Call(timestamp), charlie);


26

INFINITEGRAPH
Code (ingest)

Alice

Vertex alice = myGraph.addVertex(new Person(“Alice”));



26

INFINITEGRAPH
Code (ingest)

Alice

Vertex alice = myGraph.addVertex(new Person(“Alice”)); Bob



26

INFINITEGRAPH
Code (ingest)

Alice


carlos.addEdge(new Payment(100000.00), charlie); Carlos


26

INFINITEGRAPH
Code (ingest)

Alice



Charlie

26

INFINITEGRAPH
Code (ingest)

Alice
meets



Charlie

26

INFINITEGRAPH
Code (ingest)

Alice
meets

calls


Charlie

26

INFINITEGRAPH
Code (ingest)

Alice
meets

calls

pays

Charlie

26

INFINITEGRAPH
Code (ingest)

Alice
meets

calls calls

pays

Charlie

26

INFINITEGRAPH
Code (Navigator)

// Create a qualifier that describes the target vertex
Qualifier findCharliePredicate =

new VertexPredicate(personType, "name == ’Charlie'");

// Construct a navigator which starts with Alice and uses a result qualifier
// to find all paths in the graph to Charlie
Navigator charlieFinder = alice.navigate(

Guide.SIMPLE_BREADTH_FIRST,

// default guide

Qualifier.ANY,

// no path constraints

findCharliePredicate ,

// find paths ending with Charlie
myResultHandler);

// fire results to supplied handler

// Start the navigator
charlieFinder.start();

27

INFINITEGRAPH
Visualization


28

INFINITEGRAPH
Pricing

Price
Edition License Description
(annual)

Complete database but
„InﬁniteGraph FREE“ Free limitation to 1 million 0 €
vertices or edges

starts at app. 5000 $
„Pay as you go“ Commercial No limitation (depends on count of
vertices and edges)

Focus on „bigger“ >..... €
„Unit or site licensing“ Commercial
environments (No price available)

Source: http://objectivity.com/products/inﬁnitegraph/overview

29

FALLEN-8
Overview

• In-memory graph database

• Implemented in C# (platform independent because of mono)

• 4 billion vertices or edges, each element can have app. 65000
properties

• Indexes on vertices and/or edges

• Core is open source (MIT-license), plugins can have any license

31

FALLEN-8
Persistence

• Persistence in form of „save-points“ (all vertices and edges are
serialized en bloc)

• Commodity hardware allows to (de)serialize app. 2 million
vertices or edges per second

• Saving blocks only write operations

• Performance + reliability

32

FALLEN-8
Architecture

Services

Index-
Traversal-framework
framework

Core API

Vertices and edges

RAM

33

FALLEN-8
Architecture and some plugins

HA + ACID Transaktionen

REST API (via JSON) + Management/query frontend

Traversal-framework Index-framework
(incl. path analysis) (incl R* tree index)

Core API

Vertices and edges

RAM

34

FALLEN-8
Benchmark - friends of a friend

35

FALLEN-8

1

35

FALLEN-8

2

3

1

4

5

35

FALLEN-8
6

7

8
2

9

3 10

1
11

4
12

13
5

14

15

16
35

FALLEN-8

Fallen-8 Neo4J
5,000

4,000

3,000
t in ms

2,000

1,000

0
run
Source: Martin Junghanns

36

FALLEN-8
Benchmark - traversals per second

Source: Sebastian Dechant

37

FALLEN-8

1


37

FALLEN-8

1 2


37

FALLEN-8

1 2

Graph: |V| = 10000, |E| = 600.000 (equally distributed)
System: Windows Server 2008 R2, Intel Xeon E5620 (2,40 GHz), 6 GB RAM


37

FALLEN-8

1 2

Graph: |V| = 10000, |E| = 600.000 (equally distributed)
System: Windows Server 2008 R2, Intel Xeon E5620 (2,40 GHz), 6 GB RAM

MySQL 62,168
PostgreSQL 78,449
Neo4J 943,580
InfiniteGraph 1,243,084
Fallen-8 196,930,256
1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

traversals / sec

37

FALLEN-8
Roadmap

• Release: July 2012

• Q3 2012

• High availability (MIT) using Amazon EC2

• 2013

• Graph partitioning (MIT or ???)

38

THANKS
&
Q&A
Email: Henning@RauchEntwicklung.biz
Url: http://www.NoSQL-Database.com
Twitter: http://www.twitter.com/cosh23

39

Graphdatabases

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Recently uploaded

Recently uploaded (20)

Graphdatabases

Editor's Notes