Persistent graphs in Python with Neo4j
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Persistent graphs in Python with Neo4j

on

  • 15,332 views

Presentation from PyCon 2010.

Presentation from PyCon 2010.

Statistics

Views

Total Views
15,332
Views on SlideShare
14,231
Embed Views
1,101

Actions

Likes
27
Downloads
345
Comments
1

17 Embeds 1,101

http://nosql.mypopescu.com 700
http://join5works.com 253
http://www.slideshare.net 47
http://fphaskell.blogspot.com 36
http://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 20
http://5works.co 11
http://fphaskell.blogspot.ru 10
http://www.techgig.com 5
http://irr.posterous.com 4
https://twitter.com 3
http://bender 3
http://a0.twimg.com 2
http://us-w1.rockmelt.com 2
http://bender.radada.gruik.org 2
http://fphaskell.blogspot.de 1
http://zootool.com 1
http://fphaskell.blogspot.ch 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Small typo there neo4j.BREADTH_FIRST <-- I think...

    Cheers for this, are there more (and more) python examples?

    Thanks

    Tom
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Persistent graphs in Python with Neo4j Presentation Transcript

  • 1. Persistent Graphs in Python with Neo4j twitter: @thobe / #neo4j Tobias Ivarsson email: tobias@neotechnology.com web: http://www.neo4j.org/ Hacker @ Neo Technology web: http://www.thobe.org/ Sunday, February 21, 2010
  • 2. We all know the relational model. Attendees It has been predominant for a long time. username fullname registration tutorials payment guido Guido van Rossum null yes 0 thobe Tobias Ivarsson 2009-12-12 no 300 joe John Doe 2010-02-05 yes 700 ... ... ... ... ... 2 Sunday, February 21, 2010
  • 3. Attendees The relational model has username fullname registration tutorials payment a few problems, such as: •poor support for sparse data •modifying the data guido Guido van Rossum null yes 0 model is almost exclusively done through adding tables thobe Tobias Ivarsson 2009-12-12 no 300 joe John Doe 2010-02-05 yes 700 ... ... ... ... ... Location username latitude longitude title publish thobe 55°36'47.70"N 12°58'34.50"E Malmö yes San joe 37°49'36.00"N 122°25'22.00"W no Francisco ... ... ... ... ... 3 Sunday, February 21, 2010
  • 4. Attendees Sessions username fullname registration tutorials payment id title time room ... ... ... ... ... ... guido Guido van Rossum null yes 0 ... ... ... ... ... thobe Tobias Ivarsson 2009-12-12 no 300 Session attendance joe John Doe 2010-02-05 yes 700 session user ... ... ... ... ... ... ... Location ... ... username latitude longitude title publish More complication... thobe 55°36'47.70"N 12°58'34.50"E Malmö yes ... ... ... ... After a while, modeling ... ... complex relationships ... ... leads to complicated ...... ...... San ...... schemasjoe 37°49'36.00"N 122°25'22.00"W no ...... Francisco ...... ...... ...... ...... ... ... ... ... ... 4 Sunday, February 21, 2010
  • 5. A number of companies have realized that the relational model is insufficient and are working on alternative database solutions. 5 Sunday, February 21, 2010
  • 6. Most focus on scaling to large numbers 192.168.0.15 192.168.0.16 192.168.0.21 192.168.0.14 6 Sunday, February 21, 2010
  • 7. Graph Databases focuses on structure of data 7 Sunday, February 21, 2010
  • 8. Positioning w.r.t. other NOSQL DBs Size Key/Value stores Bigtable clones Document databases Graph databases Complexity 8 Sunday, February 21, 2010
  • 9. Positioning w.r.t. other NOSQL DBs Size Key/Value stores Bigtable clones Document databases Graph databases Billions of nodes and relationships > 90% of use cases Complexity 8 Sunday, February 21, 2010
  • 10. What is Neo4j? ๏ Neo4j is a Graph Database • Non-relational (“#nosql”), transactional (ACID), embedded • Data is stored as a Graph / Network ‣Nodes and relationships with properties ‣“Property Graph” or “edge-labeled multidigraph” ๏ Neo4j is Open Source / Free (as in speech) Software • AGPLv3 Prices are available at http://neotechnology.com/ • Commercial (“dual license”) license available Contact us if you have questions and/or special license needs (e.g. if you want an evaluation license) ‣Free (as in beer) for “small” installations ‣Inexpensive (as in startup-friendly) when you grow 9 Sunday, February 21, 2010
  • 11. More about Neo4j ๏ Neo4j is stable • In 24/7 operation since 2003 ๏ Neo4j is in active development • Neo Technology got VC funding October 2009 ๏ Neo4j delivers high performance graph operations • traverses 1’000’000+ relationships / second on commodity hardware 10 Sunday, February 21, 2010
  • 12. The Neo4j Graph data model ๏ Nodes are connected to one another through relationships ๏ A Relationship is a connection between two nodes • Relationships have types • Relationships have a direction • Relationships are traversed equally fast in either direction ๏ Properties are mappings from a string key to a primitive value • Both Nodes and Relationships have properties • Primitive values are any of these (or an array of these): ‣String ‣Numbers: float, double, integers (1-8 byte) 11 Sunday, February 21, 2010
  • 13. The Neo4j Graph data model name: “Mary” LOVES name: “James” age: 35 age: 32 LIVES WITH twitter: “@spam” LOVES OWNS property type: “car” DRIVES brand: “Volvo” model: “V70” 12 Sunday, February 21, 2010
  • 14. Graphs are all around us A B C D ... 1 17 3.14 3 17.79333333333 2 42 10.11 14 30.33 3 316 6.66 1 2104.56 4 32 9.11 592 0.492432432432 5 Even if this spread sheet looks like it could be a fit for a RDBMS 2153.175765766 it isn’t: •RDBMSes have problems with ... extending indefinitely on both rows and collumns •Formulas and data dependencies would quickly lead to heavy join operations 13 Sunday, February 21, 2010
  • 15. Graphs are all around us A B C D ... 1 17 3.14 3 = A1 * B1 / C1 2 42 10.11 14 = A2 * B2 / C2 3 316 6.66 1 = A3 * B3 / C3 4 32 9.11 592 = A4 * B4 / C4 5 = SUM(D2:D5) ... 14 Sunday, February 21, 2010
  • 16. Graphs are all around us A B C D ... 1 17 3.14 3 = A1 * B1 / C1 2 42 10.11 14 = A2 * B2 / C2 3 316 6.66 1 = A3 * B3 / C3 4 32 9.11 592 = A4 * B4 / C4 5 = SUM(D2:D5) ... 14 Sunday, February 21, 2010
  • 17. Graphs are all around us If we add external data sources the problem becomes even more interesting... 17 3.14 3 = A1 * B1 / C1 42 10.11 14 = A2 * B2 / C2 316 6.66 1 = A3 * B3 / C3 32 9.11 592 = A4 * B4 / C4 = SUM(D2:D5) 15 Sunday, February 21, 2010
  • 18. Graphs are all around us If we add external data sources the problem becomes even more interesting... 17 3.14 3 = A1 * B1 / C1 42 10.11 14 = A2 * B2 / C2 316 6.66 1 = A3 * B3 / C3 32 9.11 592 = A4 * B4 / C4 = SUM(D2:D5) 15 Sunday, February 21, 2010
  • 19. Graphs are whiteboard friendly 16 Sunday, February 21, 2010
  • 20. Graphs are whiteboard friendly * 1 * * 1 * 1 * 1 * 16 Sunday, February 21, 2010
  • 21. Graphs are whiteboard friendly thobe Joe project blog Wardrobe Strength Hello Joe Modularizing Jython Neo4j performance analysis 16 Sunday, February 21, 2010
  • 22. Query Languages ๏ Traversal API ๏ Sparql - “SQL for linked data” SELECT ?person WHERE { ?person neo4j:KNOWS ?friend . ?friend neo4j:KNOWS ?foe . ?foe neo4j:name “Larry Ellison” . } ๏ Gremlin - “perl for graphs” ./outE[@label='KNOWS']/inV[@age > 30]/@name 17 Sunday, February 21, 2010
  • 23. Python integration for Neo4j ๏ Mapping of the core Neo4j API for Python • Making it feel “Pythonic” ๏ Available from the Neo4j repository (and soon from PyPI) • http://components.neo4j.org/neo4j.py ‣svn co http://svn.neo4j.org/components/neo4j.py/trunk neo4j-python ๏ Works with both Jython and CPython • The threading of Jython is a plus with an embedded db... ๏ Comes with Django empowering batteries included • Could have support for other frameworks in the future 18 Sunday, February 21, 2010
  • 24. Simple interaction import neo4j graphdb = neo4j.GraphDatabase(“var/neo”) with graphdb.transaction: james = graphdb.node(name=“James”, age=32, twitter=“@spam”) mary = graphdb.node(name=“Mary”, age=35) the_car = graphdb.node(brand=“Volvo”, model=“V70”) james.LOVES( mary ) mary.LOVES( james ) james.LIVES_WITH( mary ) james.OWNS( the_car, property_type=“car” ) Creates the graph we saw in the first example. mary.DRIVES( the_car ) 19 Sunday, February 21, 2010
  • 25. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] order = neo4j.BREDTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 26. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] order = neo4j.BREDTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 27. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREDTH_FIRST stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 28. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREDTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 29. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREDTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 30. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREDTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE Agent Smith (@ depth=3) for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 31. Graph traversals name: “The Architect” disclosure: “public” name: “Thomas Anderson” age: 29 name: “Cypher” last name: “Reagan” KNOWS name: “Morpheus” KNOWS KNOWS rank: “Captain” CODED BY LOVES occupation: “Total badass” KNOWS KNOWS name: “Trinity” disclosure: “secret” name: “Agent Smith” version: “1.0b” since: “meeting the oracle” since: “a year before the movie” language: “C++” cooperates on: “The Nebuchadnezzar” import neo4j class Friends(neo4j.Traversal): # Traversals ! queries in Neo4j types = [ neo4j.Outgoing.KNOWS ] Morpheus (@ depth=1) order = neo4j.BREDTH_FIRST Trinity (@ depth=1) stop = neo4j.STOP_AT_END_OF_GRAPH Cypher (@ depth=2) returnable = neo4j.RETURN_ALL_BUT_START_NODE Agent Smith (@ depth=3) for friend_node in Friends(mr_anderson): print “%s (@ depth=%s)” % ( friend_node[“name”], friend_node.depth ) 20 Sunday, February 21, 2010
  • 32. Batteries for Django from neo4j.model import django_model as models class Movie(models.NodeModel): title = models.Property(indexed=True) year = models.Property() href = property(lambda self: ('/movie/%s/' % (self.node.id,))) def __unicode__(self): return self.title class Actor(models.NodeModel): name = models.Property(indexed=True) href = property(lambda self: ('/actor/%s/' % (self.node.id,))) def __unicode__(self): return self.name # etc. ... 21 Sunday, February 21, 2010
  • 33. “My ORM already does this” ๏ ORMs and model evolution is a hard problem • virtually unsupported in Django ๏ SQL is a “compatible” across many RDBMSs • data is still locked in ๏ Each ORM maps object models differently • Moving to another ORM == legacy schema support ‣except your legacy schema is strange auto-generated ๏ Object/Graph Mapping is always done the same • allows you to keep your data through application changes • or share data between multiple implementations 22 Sunday, February 21, 2010
  • 34. What your ORM doesn’t do ๏ Drop down to underlying graph model • Traversals • Graph algorithms • Shortest path(s) • etc. 23 Sunday, February 21, 2010
  • 35. Buzzword summary http://neo4j.org/ SPARQL AGPLv3 Open Source ACID Object mapping Shortest path NOSQL startup friendly whiteboard friendly Traversal Query language Embedded Beer Software Transactional Memory polyglot persistence Free Software Scaling to complexity 24 Sunday, February 21, 2010
  • 36. http://neotechnology.com Sunday, February 21, 2010