graphs databases!
and

python
Maksym Klymyshyn
CTO @ GVMachines Inc. (zakaz.ua)
What’s inside?
‣

PostgreSQL

‣

Neo4j

‣

ArangoDB
Python Frameworks
‣

Bulbflow

‣

py4neo

‣

NetworkX

‣

Arango-python
Relational to Graph model
crash course
“Switching from relational to the graph model”!
by Luca Garulli

http://goo.gl/z08qwk!
!
http://www.slideshare.net/lvca/switching-from-relational-to-the-graph-model
My motivation is quite
simple:
“The best material model of a cat is another, or
preferably the same, cat.”

–Norbert Wiener
Old good Postgres
create table nodes (
node integer primary key,
name varchar(10) not null,
feat1 char(1), feat2 char(1))

!

create table edges (
a integer not null references nodes(node) on update cascade on delete cascade,
b integer not null references nodes(node) on update cascade on delete cascade,
primary key (a, b));

!

create index a_idx ON edges(a);
create index b_idx ON edges(b);

!
create
!

unique index pair_unique_idx on edges (LEAST(a, b), GREATEST(a, b));

; and no self-loops
alter table edges add constraint no_self_loops_chk check (a <> b);

!

insert
insert
insert
insert
insert
insert
insert

!

into
into
into
into
into
into
into

nodes
nodes
nodes
nodes
nodes
nodes
nodes

values
values
values
values
values
values
values

(1,
(2,
(3,
(4,
(5,
(6,
(7,

'node1',
'node2',
'node3',
'node4',
'node5',
'node6',
'node7',

'x',
'x',
'x',
'z',
'x',
'x',
'x',

'y');
'w');
'w');
'w');
'y');
'z');
'y');

insert into edges values (1, 3), (2, 1),
(2, 4), (3, 4), (3, 5), (3, 6), (4, 7), (5, 1), (5, 6), (6, 1);

!

; directed graph
select * from nodes n left join edges e on n.node = e.b where e.a = 2;

!

; undirected graph
select * from nodes where node in (select case when a=1 then b else a end from edges where 1
in (a,b));

!
Я из Одессы,
я просто бухаю.
Neo4j
Most famous graph
database.
•

1,333 mentions within repositories on Github

•

1,140,000 results in Google

•

26,868 tweets

•

Really nice Admin interface

•

Awesome help tips
A lot of python libraries

Py2Neo, Neomodel, neo4django, bulbflow
; Create a node1, node2 and
; relation RELATED between two nodes
CREATE (node1 {name:"node1"}),
(node2 {name: "node2"}),
(node1)-[:RELATED]->(node2);
!
neo4j is friendly and powerful.
The only thing is a bit complex
querying language – Cypher
py4neo nodes
from py2neo import neo4j, node, rel
!
!
graph_db = neo4j.GraphDatabaseService(
"http://localhost:7474/db/data/")
!
die_hard = graph_db.create(
node(name="Bruce Willis"),
node(name="John McClane"),
node(name="Alan Rickman"),
node(name="Hans Gruber"),
node(name="Nakatomi Plaza"),
rel(0, "PLAYS", 1),
rel(2, "PLAYS", 3),
rel(1, "VISITS", 4),
rel(3, "STEALS_FROM", 4),
rel(1, "KILLS", 3))
py4neo paths
from py2neo import neo4j, node
!

graph_db = neo4j.GraphDatabaseService(
"http://localhost:7474/db/data/")
alice, bob, carol = node(name="Alice"), 
node(name="Bob"), 
node(name="Carol")
abc = neo4j.Path(
alice, "KNOWS", bob, "KNOWS", carol)
abc.create(graph_db)
abc.nodes
# [node(**{'name': 'Alice'}),
# node(**{‘name': ‘Bob'}),
# node(**{‘name': 'Carol'})]
Alice KNOWS Bob KNOWS Carol
bulbflow framework

from bulbs.neo4jserver import Graph
g = Graph()
james = g.vertices.create(name="James")
julie = g.vertices.create(name="Julie")
g.edges.create(james, "knows", julie)
FlockDB
OrientDB
InfoGrid
HyperGraphDB

WAT?
ArangoDB
“In any investment, you expect to have fun and
make profit.”

–Michael Jordan
I’m developer of python driver for ArangoDB
•

NoSQL Database storage

•

Graph of documents

•

AQL (arango query language) to execute graph queries

•

Edge data type to create edges between nodes (with
properties)

•

Multiple edges collections to keep different kind of
edges

•

Support of Gremlin graph query language
Small experiment with graphs and twitter:!
I’ve looked on my tweets and people who added it
to favorites.
After that I’ve looked to that person’s tweets and did
the same thing with people who favorited their
tweets.
1-level depth
2-level depth
3-level depth
Code behind
from arango import create
!

arango = create(db="tweets_maxmaxmaxmax")
arango.database.create()
arango.tweets.create()
arango.tweets_edges.create(
type=arango.COLLECTION_EDGES)
!
Here we creating edge from from_doc to to_doc
!

from_doc = arango.tweets.documents.create({})
to_doc = arango.tweets.documents.create({})
arango.tweets_edges.edges.create(from_doc, to_doc)

Getting edges for tweet 196297127
query = db.tweets_edge.query.over(
F.EDGES(
"tweets_edges",
~V("tweets/196297127"), ~V("outbound")))
Full example

•

Sample dataset with 10 users

•

Relations between users

•

Visualise within admin interface
Sample dataset
from arango import create
!
def dataset(a):
a.database.create()
a.users.create()
a.knows.create(type=a.COLLECTION_EDGES)
!
for u in range(10):
a.users.documents.create({
"name": "user_{}".format(u),
"age": u + 20,
"gender": u % 2 == 0})
!
!
a = create(db="experiments")
dataset(a)
Relations between users
def relations(a):
rels = (
(0, 1), (0, 2), (2, 3), (4, 3), (3, 5),
(5, 1), (0, 5), (5, 6), (6, 7), (7, 8), (9, 8))

!
!

!

get_user = lambda id: a.users.query.filter(
"obj.name == 'user_{}'".format(id)).execute().first
for f, t in rels:
what = "user_{} knows user_{}".format(f, t)
from_doc, to_doc = get_user(f), get_user(t)
a.knows.edges.create(from_doc, to_doc, {"what": what})
print ("{}->{}: {}".format(from_doc.id, to_doc.id, what))

a = create(db="experiments")
relations(a)
Relations between users
users/2744664487->users/2744926631:
users/2744664487->users/2745123239:
users/2745123239->users/2745319847:
users/2745516455->users/2745319847:
users/2745319847->users/2745713063:
users/2745713063->users/2744926631:
users/2744664487->users/2745713063:
users/2745713063->users/2745909671:
users/2745909671->users/2746106279:
users/2746106279->users/2746302887:
users/2746499495->users/2746302887:

user_0
user_0
user_2
user_4
user_3
user_5
user_0
user_5
user_6
user_7
user_9

knows
knows
knows
knows
knows
knows
knows
knows
knows
knows
knows

user_1
user_2
user_3
user_3
user_5
user_1
user_5
user_6
user_7
user_8
user_8
AQL, getting paths
FOR p IN PATHS(users, knows, 'outbound')
FILTER p.source.name == 'user_5'
RETURN p.vertices[*].name

from arango import create
from arango.aql import F, V

!
!

def querying(a):
for data in a.knows.query.over(
F.PATHS("users", "knows", ~V("outbound")))
.filter("obj.source.name == '{}'".format("user_5"))
.result("obj.vertices[*].name")
.execute(wrapper=lambda c, i: i):
print (data)

!
!

a = create(db="experiments")

!

querying(a)
Paths output
['user_5']
['user_5',
['user_5',
['user_5',
['user_5',

'user_1']
'user_6']
'user_6', 'user_7']
'user_6', 'user_7', 'user_8']
Links
•

Arango paths: http://goo.gl/n2L3SK

•

Neo4j: http://goo.gl/au5y9I

•

Scraper: http://goo.gl/nvMFGk!

•

Visualiser: http://goo.gl/Rzdwci
Thanks. Q’s?
!

@maxmaxmaxmax

Odessapy2013 - Graph databases and Python