Neo4j  or: why graph dbs
                    kick ass



Emil Eifrem
CEO, Neo Technology
Death?
Community
experimentation:
  CouchDB
  SimpleDB
  Hypertable
  Cassandra
  Scalaris
  ...
?
Trend 1: data is getting more connected
                                                                                  ...
Trend 2: ... and more semi-structured
 Individualization of content!
   In the salary lists of the 1970s, all elements had...
Relational database
              Salary List
Performance




                            Majority of
                    ...
We = hackers!
   So that’s vCPU...
what about vhackers?
Whiteboard friendly?




       ?
                                 owns
                       Björn               Big Car...
Alternative?
 a graph database
A graph




     A simple food web
                         (image from vtaide.com)
A big graph




       Part of the food
      web of the North
                Atlantic   (image from jeffkenedyassociates...
A social graph
 LinkedIn
 Facebook
 Orkut
 Hi5
 Friendster
 Dopplr
 ...
Your file system
 Files & folders
 Alias, links
 Read & write
 Roles, groups
 ...
Shut up and
                                              show us the
                                                 cod...
The Graph DB model: representation
 Core abstractions:                          name = “Emil”
                            ...
Example: The Matrix
                                                                            name = “The Architect”
   ...
Code (1): Building a node space
NeoService neo = ... // Get factory


// Create Thomas 'Neo' Anderson
Node mrAnderson = ne...
Code (1): Building a node space
NeoService neo = ... // Get factory
Transaction tx = neo.begin();

// Create Thomas 'Neo' ...
Code (1b): Defining RelationshipTypes
// In package org.neo4j.api.core
public interface RelationshipType
{
   String name(...
The Graph DB model: traversal
 Traverser framework for                    name = “Emil”
 high-performance traversing      ...
Example: Mr Anderson’s friends
                                                                            name = “The Arc...
Code (2): Traversing a node space
// Instantiate a traverser that returns Mr Anderson's friends
Traverser friendsTraverser...
name = “The Architect”
                                name = “Morpheus”
                                rank = “Captain”
...
Example: Friends in love?
                                                                                  name = “The Ar...
Code (3a): Custom traverser
// Create a traverser that returns all “friends in love”
Traverser loveTraverser = mrAnderson....
Code (3a): Custom traverser
// Traverse the node space and print out the result
System.out.println( quot;Who’s a lover?quo...
name = “The Architect”
                                   name = “Morpheus”
                                   rank = “Cap...
Bonus code: domain model
    How do you implement your domain model?
    Use the delegator pattern, i.e. every domain enti...
Domain layer frameworks
 Qi4j (www.qi4j.org)
   Framework for doing DDD in pure Java5
   Defines Entities / Associations /...
Neo4j system characteristics
 Disk-based
   Native graph storage engine with custom (“SSD-
   ready”) binary on-disk forma...
Social network pathExists()
          12
                               ~1k persons
                           3
7        ...
Social network pathExists()

                 2
                Emil
         1                                    5
     ...
Pros & Cons compared to RDBMS
+ No O/R impedance mismatch (whiteboard friendly)
+ Can easily evolve schemas
+ Can represen...
More consequences
 Ability to capture semi-structured information
   => allowing individualization of content
 No predefin...
The Neo4j ecosystem
 Neo4j is an embedded database
   Tiny teeny lil jar file
 Component ecosystem
   index-util
   neo-me...
Example: NeoRDF

       NeoRDF triple/quad store

            OWL         SPARQL

      RDF
             Metamodel      Gr...
Future development
 Productify RDF support (Neo4j 1.1)
 Tool support (Neo4j 1.1 and onwards)
 Language bindings
   Current...
Future development
 Distribution (Neo4j 2.0), current thoughts:
   Best bet today: sharding on top of
   (Infiniflow) from...
How ego are you? (aka other impls?)
 Franz’ AllegroGraph          (http://agraph.franz.com)

    Proprietary, Lisp, RDF-or...
Conclusion
 Graphs && Neo4j => teh awesome!
 Available NOW under AGPLv3 / commercial license
   AGPLv3: “if you’re open so...
Questions?




             Image credit: lost again! Sorry :(
http://neotechnology.com
Neo4j architecture gotchas
 Focus on the domain (whiteboard friendly –
 domain-first development)
 Purpose of the domain l...
Implementation pointers
 Use the delegator pattern, i.e. every domain entity
 wraps a Neo4j primitive as follows:
   Actor...
Upcoming SlideShare
Loading in...5
×

Neo4j -- or why graph dbs kick ass

23,209

Published on

A presentation of the Neo4j graph database given at QCon SF 2008. It describes why relational databases are increasingly unfit for many applications today and why graphs may be a good fit. It also covers the fundamentals of how to program with Neo4j.

Published in: Technology, News & Politics
3 Comments
72 Likes
Statistics
Notes
No Downloads
Views
Total Views
23,209
On Slideshare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
0
Comments
3
Likes
72
Embeds 0
No embeds

No notes for slide

Neo4j -- or why graph dbs kick ass

  1. 1. Neo4j or: why graph dbs kick ass Emil Eifrem CEO, Neo Technology
  2. 2. Death? Community experimentation: CouchDB SimpleDB Hypertable Cassandra Scalaris ...
  3. 3. ?
  4. 4. Trend 1: data is getting more connected Giant Global Information connectivity Graph (GGG) Ontologies RDF Folksonomies Tagging User- Wikis generated content Blogs RSS Hypertext Text documents web 1.0 web 2.0 “web 3.0” 1990 2000 2010 2020
  5. 5. Trend 2: ... and more semi-structured Individualization of content! In the salary lists of the 1970s, all elements had exactly one job In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15? Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
  6. 6. Relational database Salary List Performance Majority of Webapps Social network Semantic } Trading custom Information complexity
  7. 7. We = hackers! So that’s vCPU... what about vhackers?
  8. 8. Whiteboard friendly? ? owns Björn Big Car build transport , Kids & DayCare Veggies
  9. 9. Alternative? a graph database
  10. 10. A graph A simple food web (image from vtaide.com)
  11. 11. A big graph Part of the food web of the North Atlantic (image from jeffkenedyassociates.com)
  12. 12. A social graph LinkedIn Facebook Orkut Hi5 Friendster Dopplr ...
  13. 13. Your file system Files & folders Alias, links Read & write Roles, groups ...
  14. 14. Shut up and show us the code! Image credit: lost! (please don’t shoot me)
  15. 15. The Graph DB model: representation Core abstractions: name = “Emil” age = 29 Nodes sex = “yes” Relationships between nodes 1 2 Properties on both type = KNOWS time = 4 years 3 type = car vendor = “SAAB” model = “95 Aero”
  16. 16. Example: The Matrix name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity”
  17. 17. Code (1): Building a node space NeoService neo = ... // Get factory // Create Thomas 'Neo' Anderson Node mrAnderson = neo.createNode(); mrAnderson.setProperty( quot;namequot;, quot;Thomas Andersonquot; ); mrAnderson.setProperty( quot;agequot;, 29 ); // Create Morpheus Node morpheus = neo.createNode(); morpheus.setProperty( quot;namequot;, quot;Morpheusquot; ); morpheus.setProperty( quot;rankquot;, quot;Captainquot; ); morpheus.setProperty( quot;occupationquot;, quot;Total bad assquot; ); // Create a relationship representing that they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ...create Trinity, Cypher, Agent Smith, Architect similarly
  18. 18. Code (1): Building a node space NeoService neo = ... // Get factory Transaction tx = neo.begin(); // Create Thomas 'Neo' Anderson Node mrAnderson = neo.createNode(); mrAnderson.setProperty( quot;namequot;, quot;Thomas Andersonquot; ); mrAnderson.setProperty( quot;agequot;, 29 ); // Create Morpheus Node morpheus = neo.createNode(); morpheus.setProperty( quot;namequot;, quot;Morpheusquot; ); morpheus.setProperty( quot;rankquot;, quot;Captainquot; ); morpheus.setProperty( quot;occupationquot;, quot;Total bad assquot; ); // Create a relationship representing that they know each other mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS ); // ...create Trinity, Cypher, Agent Smith, Architect similarly tx.commit();
  19. 19. Code (1b): Defining RelationshipTypes // In package org.neo4j.api.core public interface RelationshipType { String name(); } // In package org.yourdomain.yourapp // Example on how to roll dynamic RelationshipTypes class MyDynamicRelType implements RelationshipType { private final String name; MyDynamicRelType( String name ){ this.name = name; } public String name() { return this.name; } } // Example on how to kick it, static-RelationshipType-like enum MyStaticRelTypes implements RelationshipType { KNOWS, WORKS_FOR, }
  20. 20. The Graph DB model: traversal Traverser framework for name = “Emil” high-performance traversing age = 29 sex = “yes” across the node space 1 2 type = KNOWS time = 4 years 3 type = car vendor = “SAAB” model = “95 Aero”
  21. 21. Example: Mr Anderson’s friends name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity”
  22. 22. Code (2): Traversing a node space // Instantiate a traverser that returns Mr Anderson's friends Traverser friendsTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS, Direction.OUTGOING ); // Traverse the node space and print out the result System.out.println( quot;Mr Anderson's friends:quot; ); for ( Node friend : friendsTraverser ) { System.out.printf( quot;At depth %d => %s%nquot;, friendsTraverser.currentPosition().getDepth(), friend.getProperty( quot;namequot; ) ); }
  23. 23. name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 KN O 7 3 WS 13 S KN name = “Cypher” KNOW OW last name = “Reagan” S name = “Agent Smith” disclosure = secret version = 1.0b age = 3 days age = 6 months language = C++ 2 name = “Trinity” $ bin/start-neo-example Mr Anderson's friends: At depth 1 => Morpheus friendsTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, At depth 1 => Trinity StopEvaluator.END_OF_GRAPH, At depth 2 => Cypher ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.KNOWS, At depth 3 => Agent Smith Direction.OUTGOING ); $
  24. 24. Example: Friends in love? name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS CODED_BY 1 7 3 KN O WS 13 S KN KNOW name = “Cypher” OW last name = “Reagan” S name = “Agent Smith” LO disclosure = secret version = 1.0b VE age = 6 months language = C++ S 2 name = “Trinity”
  25. 25. Code (3a): Custom traverser // Create a traverser that returns all “friends in love” Traverser loveTraverser = mrAnderson.traverse( Traverser.Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, new ReturnableEvaluator() { public boolean isReturnableNode( TraversalPosition pos ) { return pos.currentNode().hasRelationship( RelTypes.LOVES, Direction.OUTGOING ); } }, RelTypes.KNOWS, Direction.OUTGOING );
  26. 26. Code (3a): Custom traverser // Traverse the node space and print out the result System.out.println( quot;Who’s a lover?quot; ); for ( Node person : loveTraverser ) { System.out.printf( quot;At depth %d => %s%nquot;, loveTraverser.currentPosition().getDepth(), person.getProperty( quot;namequot; ) ); }
  27. 27. name = “The Architect” name = “Morpheus” rank = “Captain” name = “Thomas Anderson” occupation = “Total badass” 42 age = 29 disclosure = public KNOWS KNOWS KN O CODED_BY 1 7 3 WS 13 S KN KNOW name = “Cypher” OW last name = “Reagan” S name = “Agent Smith” LO disclosure = secret version = 1.0b VE age = 6 months language = C++ S 2 name = “Trinity” $ bin/start-neo-example new ReturnableEvaluator() Who’s a lover? { public boolean isReturnableNode( TraversalPosition pos) At depth 1 => Trinity { $ return pos.currentNode(). hasRelationship( RelTypes.LOVES, Direction.OUTGOING ); } },
  28. 28. Bonus code: domain model How do you implement your domain model? Use the delegator pattern, i.e. every domain entity wraps a Neo4j primitive: // In package org.yourdomain.yourapp class PersonImpl implements Person { private final Node underlyingNode; PersonImpl( Node node ){ this.underlyingNode = node; } public String getName() { return this.underlyingNode.getProperty( quot;namequot; ); } public void setName( String name ) { this.underlyingNode.setProperty( quot;namequot;, name ); } }
  29. 29. Domain layer frameworks Qi4j (www.qi4j.org) Framework for doing DDD in pure Java5 Defines Entities / Associations / Properties Sound familiar? Nodes / Rel’s / Properties! Neo4j is an “EntityStore” backend NeoWeaver (http://components.neo4j.org/neo-weaver) Weaves Neo4j-backed persistence into domain objects in runtime (dynamic proxy / cglib based) Veeeery alpha, but veeery cool
  30. 30. Neo4j system characteristics Disk-based Native graph storage engine with custom (“SSD- ready”) binary on-disk format Transactional JTA/JTS, XA, 2PC, Tx recovery, deadlock detection, etc Scalable Several billions of nodes/rels/props on single JVM Robust 5+ years in 24/7 production
  31. 31. Social network pathExists() 12 ~1k persons 3 7 1 Avg 50 friends per person pathExists(a, b) limit 36 41 77 depth 4 5 Two backends Eliminate disk IO so warm up caches
  32. 32. Social network pathExists() 2 Emil 1 5 7 Mike Kevin 3 John Marcus 9 4 Bruce Leigh # persons query time Relational database (MySQL) 1 000 2 000 ms Graph database (Neo4j) 1 000 2 ms Graph database (Neo4j) 1 000 000 2 ms
  33. 33. Pros & Cons compared to RDBMS + No O/R impedance mismatch (whiteboard friendly) + Can easily evolve schemas + Can represent semi-structured info + Can represent graphs/networks (with performance) - Lacks in tool and framework support - No other implementations => potential lock in - No support for ad-hoc queries +
  34. 34. More consequences Ability to capture semi-structured information => allowing individualization of content No predefined schema => easier to evolve model => can capture ad-hoc relationships Can capture non-normative relations => easy to model specific links to specific sets All state is kept in transactional memory => improves application concurrency
  35. 35. The Neo4j ecosystem Neo4j is an embedded database Tiny teeny lil jar file Component ecosystem index-util neo-meta neo-utils owl2neo sparql-engine ... See http://components.neo4j.org
  36. 36. Example: NeoRDF NeoRDF triple/quad store OWL SPARQL RDF Metamodel Graph match Neo4j
  37. 37. Future development Productify RDF support (Neo4j 1.1) Tool support (Neo4j 1.1 and onwards) Language bindings Currently Jython, Python, Ruby Probably works well with Groovy, Beanshell, etc Tomorrow Scala? .NET? Erlang? Standalone server Experimental RemoteNeo in laboratory right now How standardize REST API? SPARQL protocol?
  38. 38. Future development Distribution (Neo4j 2.0), current thoughts: Best bet today: sharding on top of (Infiniflow) from Paremus: www.codecauldron.org Fundamentals: CAP theorem BASE (“ACID 2.0”) Eventual consistency Separate HA and data partitioning Generic clustering algorithm as base case, but give lots of knobs for developers
  39. 39. How ego are you? (aka other impls?) Franz’ AllegroGraph (http://agraph.franz.com) Proprietary, Lisp, RDF-oriented but real graphdb FreeBase graphd (http://blog.freebase.com/2008/04/09/a- brief-tour-of-graphd/) In-house at Metaweb Kloudshare (http://whydoeseverythingsuck.com) Graph database in the cloud, still stealth mode Some academic papers from ~10 years ago G = {V, E}
  40. 40. Conclusion Graphs && Neo4j => teh awesome! Available NOW under AGPLv3 / commercial license AGPLv3: “if you’re open source, we’re open source” If you have proprietary software? Must buy a commercial license But up to 1M primitives it’s free for all uses! Download http://neo4j.org Feedback http://lists.neo4j.org
  41. 41. Questions? Image credit: lost again! Sorry :(
  42. 42. http://neotechnology.com
  43. 43. Neo4j architecture gotchas Focus on the domain (whiteboard friendly – domain-first development) Purpose of the domain layer: “an adaptation of the generic node space to a type-safe, object-oriented abstraction expressed in the vocabulary of our domain” (!) Implementation mindset Assume the node space is always in memory Assume everything is automatically persistent Focus on logical transactions instead of artificial load/stores
  44. 44. Implementation pointers Use the delegator pattern, i.e. every domain entity wraps a Neo4j primitive as follows: Actor actor = new ActorImpl( underlyingNode ); Remember that the wrappers are extremely lightweight – they contain no state except for a reference – so create them freely! It is good practice to override equals/hashCode Delegate to underlying{Node,Relationship}’s equals() or hashCode() implementation

×