Odessapy2013 - Graph databases and Python

  • 47,509 views
Uploaded on

Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews) …

Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just drink. Meaning his drinking a lot of "Vodka" ^_^ (@tuc @hackernews)
This is local meme - when someone asking question and you will look stupid in case you don't have answer.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
47,509
On Slideshare
0
From Embeds
0
Number of Embeds
38

Actions

Shares
Downloads
148
Comments
0
Likes
35

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. graphs databases! and python Maksym Klymyshyn CTO @ GVMachines Inc. (zakaz.ua)
  • 2. What’s inside? ‣ PostgreSQL ‣ Neo4j ‣ ArangoDB
  • 3. Python Frameworks ‣ Bulbflow ‣ py4neo ‣ NetworkX ‣ Arango-python
  • 4. Relational to Graph model crash course “Switching from relational to the graph model”! by Luca Garulli http://goo.gl/z08qwk! ! http://www.slideshare.net/lvca/switching-from-relational-to-the-graph-model
  • 5. My motivation is quite simple:
  • 6. “The best material model of a cat is another, or preferably the same, cat.” –Norbert Wiener
  • 7. Old good Postgres
  • 8. create table nodes ( node integer primary key, name varchar(10) not null, feat1 char(1), feat2 char(1)) ! create table edges ( a integer not null references nodes(node) on update cascade on delete cascade, b integer not null references nodes(node) on update cascade on delete cascade, primary key (a, b)); ! create index a_idx ON edges(a); create index b_idx ON edges(b); ! create ! unique index pair_unique_idx on edges (LEAST(a, b), GREATEST(a, b)); ; and no self-loops alter table edges add constraint no_self_loops_chk check (a <> b); ! insert insert insert insert insert insert insert ! into into into into into into into nodes nodes nodes nodes nodes nodes nodes values values values values values values values (1, (2, (3, (4, (5, (6, (7, 'node1', 'node2', 'node3', 'node4', 'node5', 'node6', 'node7', 'x', 'x', 'x', 'z', 'x', 'x', 'x', 'y'); 'w'); 'w'); 'w'); 'y'); 'z'); 'y'); insert into edges values (1, 3), (2, 1), (2, 4), (3, 4), (3, 5), (3, 6), (4, 7), (5, 1), (5, 6), (6, 1); ! ; directed graph select * from nodes n left join edges e on n.node = e.b where e.a = 2; ! ; undirected graph select * from nodes where node in (select case when a=1 then b else a end from edges where 1 in (a,b)); !
  • 9. Я из Одессы, я просто бухаю.
  • 10. Neo4j
  • 11. Most famous graph database. • 1,333 mentions within repositories on Github • 1,140,000 results in Google • 26,868 tweets • Really nice Admin interface • Awesome help tips
  • 12. A lot of python libraries Py2Neo, Neomodel, neo4django, bulbflow
  • 13. ; Create a node1, node2 and ; relation RELATED between two nodes CREATE (node1 {name:"node1"}), (node2 {name: "node2"}), (node1)-[:RELATED]->(node2); !
  • 14. neo4j is friendly and powerful. The only thing is a bit complex querying language – Cypher
  • 15. py4neo nodes from py2neo import neo4j, node, rel ! ! graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") ! die_hard = graph_db.create( node(name="Bruce Willis"), node(name="John McClane"), node(name="Alan Rickman"), node(name="Hans Gruber"), node(name="Nakatomi Plaza"), rel(0, "PLAYS", 1), rel(2, "PLAYS", 3), rel(1, "VISITS", 4), rel(3, "STEALS_FROM", 4), rel(1, "KILLS", 3))
  • 16. py4neo paths from py2neo import neo4j, node ! graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") alice, bob, carol = node(name="Alice"), node(name="Bob"), node(name="Carol") abc = neo4j.Path( alice, "KNOWS", bob, "KNOWS", carol) abc.create(graph_db) abc.nodes # [node(**{'name': 'Alice'}), # node(**{‘name': ‘Bob'}), # node(**{‘name': 'Carol'})]
  • 17. Alice KNOWS Bob KNOWS Carol
  • 18. bulbflow framework from bulbs.neo4jserver import Graph g = Graph() james = g.vertices.create(name="James") julie = g.vertices.create(name="Julie") g.edges.create(james, "knows", julie)
  • 19. FlockDB OrientDB InfoGrid HyperGraphDB WAT?
  • 20. ArangoDB
  • 21. “In any investment, you expect to have fun and make profit.” –Michael Jordan
  • 22. I’m developer of python driver for ArangoDB
  • 23. • NoSQL Database storage • Graph of documents • AQL (arango query language) to execute graph queries • Edge data type to create edges between nodes (with properties) • Multiple edges collections to keep different kind of edges • Support of Gremlin graph query language
  • 24. Small experiment with graphs and twitter:! I’ve looked on my tweets and people who added it to favorites. After that I’ve looked to that person’s tweets and did the same thing with people who favorited their tweets.
  • 25. 1-level depth
  • 26. 2-level depth
  • 27. 3-level depth
  • 28. Code behind from arango import create ! arango = create(db="tweets_maxmaxmaxmax") arango.database.create() arango.tweets.create() arango.tweets_edges.create( type=arango.COLLECTION_EDGES) !
  • 29. Here we creating edge from from_doc to to_doc ! from_doc = arango.tweets.documents.create({}) to_doc = arango.tweets.documents.create({}) arango.tweets_edges.edges.create(from_doc, to_doc) Getting edges for tweet 196297127 query = db.tweets_edge.query.over( F.EDGES( "tweets_edges", ~V("tweets/196297127"), ~V("outbound")))
  • 30. Full example • Sample dataset with 10 users • Relations between users • Visualise within admin interface
  • 31. Sample dataset from arango import create ! def dataset(a): a.database.create() a.users.create() a.knows.create(type=a.COLLECTION_EDGES) ! for u in range(10): a.users.documents.create({ "name": "user_{}".format(u), "age": u + 20, "gender": u % 2 == 0}) ! ! a = create(db="experiments") dataset(a)
  • 32. Relations between users def relations(a): rels = ( (0, 1), (0, 2), (2, 3), (4, 3), (3, 5), (5, 1), (0, 5), (5, 6), (6, 7), (7, 8), (9, 8)) ! ! ! get_user = lambda id: a.users.query.filter( "obj.name == 'user_{}'".format(id)).execute().first for f, t in rels: what = "user_{} knows user_{}".format(f, t) from_doc, to_doc = get_user(f), get_user(t) a.knows.edges.create(from_doc, to_doc, {"what": what}) print ("{}->{}: {}".format(from_doc.id, to_doc.id, what)) a = create(db="experiments") relations(a)
  • 33. Relations between users users/2744664487->users/2744926631: users/2744664487->users/2745123239: users/2745123239->users/2745319847: users/2745516455->users/2745319847: users/2745319847->users/2745713063: users/2745713063->users/2744926631: users/2744664487->users/2745713063: users/2745713063->users/2745909671: users/2745909671->users/2746106279: users/2746106279->users/2746302887: users/2746499495->users/2746302887: user_0 user_0 user_2 user_4 user_3 user_5 user_0 user_5 user_6 user_7 user_9 knows knows knows knows knows knows knows knows knows knows knows user_1 user_2 user_3 user_3 user_5 user_1 user_5 user_6 user_7 user_8 user_8
  • 34. AQL, getting paths FOR p IN PATHS(users, knows, 'outbound') FILTER p.source.name == 'user_5' RETURN p.vertices[*].name from arango import create from arango.aql import F, V ! ! def querying(a): for data in a.knows.query.over( F.PATHS("users", "knows", ~V("outbound"))) .filter("obj.source.name == '{}'".format("user_5")) .result("obj.vertices[*].name") .execute(wrapper=lambda c, i: i): print (data) ! ! a = create(db="experiments") ! querying(a)
  • 35. Paths output ['user_5'] ['user_5', ['user_5', ['user_5', ['user_5', 'user_1'] 'user_6'] 'user_6', 'user_7'] 'user_6', 'user_7', 'user_8']
  • 36. Links • Arango paths: http://goo.gl/n2L3SK • Neo4j: http://goo.gl/au5y9I • Scraper: http://goo.gl/nvMFGk! • Visualiser: http://goo.gl/Rzdwci
  • 37. Thanks. Q’s? ! @maxmaxmaxmax