Your SlideShare is downloading. ×
  • Like
  • Save
Neo4j And The Benefits Of Graph Dbs 3
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Neo4j And The Benefits Of Graph Dbs 3

  • 4,842 views
Published

Emil Eifrem, the CEO, founder and developer of neo4j in Mumbai at Directi giving a talk on NoSQL databases (his area of expertise) where he covers various NoSQL databases and then spends a bulk of …

Emil Eifrem, the CEO, founder and developer of neo4j in Mumbai at Directi giving a talk on NoSQL databases (his area of expertise) where he covers various NoSQL databases and then spends a bulk of time on neo4j.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,842
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
0
Comments
0
Likes
13

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1.
  • 2. No SQL?
    Image credit: http://browsertoolkit.com/fault-tolerance.png
  • 3. Neo4j
    graph databases
    the benefits of
    #neo4j
    @emileifrem
    emil@neotechnology.com
    Emil Eifrem
    CEO, Neo Technology
  • 4. Trend 1:
    data set size
    40
    2007
    Source: IDC 2007
  • 5. 988
    Trend 1:
    data set size
    40
    2007
    2010
    Source: IDC 2007
  • 6. Trend 2: connectedness
    Giant
    Global
    Graph
    (GGG)
    Ontologies
    RDF
    Folksonomies
    Tagging
    Information connectivity
    User-
    generated
    Wikis
    content
    Blogs
    RSS
    Hypertext
    Text
    documents
    web 1.0
    web 2.0
    “web 3.0”
    1990
    2000
    2010
    2020
  • 7. Trend 3: semi-structure
    Individualization of content!
    In the salary lists of the 1970s, all elements had
    exactly one job
    In the salary lists of the 2000s, we need 5 job
    columns! Or 8? Or 15?
    Trend accelerated by the decentralization of
    content generation that is the hallmark of the age
    of participation (“web 2.0”)
  • 8. Aside: RDBMS performance
    Relational database
    Social network
    Semantic
    Trading
    custom
    Data complexity
    Salary List
    Performance
    Majority of
    Webapps
    }
  • 9. Trend 4: architecture
    1990s: Database as integration hub
  • 10. Trend 4: architecture
    2000s: (Slowly towards...)
    Decoupled services with own backend
  • 11. Why NoSQL 2009?
    Trend 1: Size.
    Trend 2: Connectivity.
    Trend 3: Semi-structure.
    Trend 4: Architecture.
  • 12. NoSQL
    overview
  • 13. First off: the name
    NoSQL is NOT “Never SQL”
    NoSQL is NOT “No To SQL”
  • 14. NOSQL
    is simply
    Not Only SQL!
  • 15. Four (emerging) NOSQL categories
    Key-value stores
    Based on Amazon's Dynamo paper
    Data model: (global) collection of K-V pairs
    Example: Dynomite, Voldemort, Tokyo*
    BigTable clones
    Based on Google's BigTable paper
    Data model: big table, column families
    Example: HBase, Hypertable, Cassandra
  • 16. Four (emerging) NOSQL categories
    Document databases
    Inspired by Lotus Notes
    Data model: collections of K-V collections
    Example: CouchDB, MongoDB
    Graph databases
    Inspired by Euler & graph theory
    Data model: nodes, rels, K-V on both
    Example: AllegroGraph, Sones, Neo4j
  • 17. NOSQL data models
    Key-value stores
    Data size
    Bigtable clones
    Document
    databases
    Graph databases
    Data complexity
  • 18. NOSQL data models
    Key-value stores
    Bigtable clones
    Document
    databases
    Graph databases
    (This is still billions of
    Data size
    nodes & relationships)
    Data complexity
    90%
    of
    use
    cases
  • 19. Graph DBs
    & Neo4j intro
  • 20. The Graph DB model: representation
    Core abstractions:
    Nodes
    name = “Emil”
    age = 29
    sex = “yes”
    Relationships between nodes
    Properties on both
    type = KNOWS
    time = 4 years
    1
    2
    type = car
    vendor = “SAAB”
    model = “95 Aero”
    3
  • 21. Example: The Matrix
    name = “The Architect”
    name = “Morpheus”
    rank = “Captain”
    occupation = “Total badass”
    42
    name = “Thomas Anderson”
    age = 29
    disclosure = public
    KNOWS
    KNOWS
    CODED_BY
    KNOW
    1
    3
    7
    S
    13
    KW
    KN
    S
    name = “Cypher”
    last name = “Reagan”
    OW
    NO
    S
    name = “Agent Smith”
    version = 1.0b
    language = C++
    disclosure = secret
    age = 6 months
    age = 3 days
    2
    name = “Trinity”
  • 22. Code (1): Building a node space
    GraphDatabaseService graphDb = ... // Get factory
    // Create Thomas 'Neo' Anderson
    Node mrAnderson = graphDb.createNode();
    mrAnderson.setProperty( "name", "Thomas Anderson" );
    mrAnderson.setProperty( "age", 29 );
    // Create Morpheus
    Node morpheus = graphDb.createNode();
    morpheus.setProperty( "name", "Morpheus" );
    morpheus.setProperty( "rank", "Captain" );
    morpheus.setProperty( "occupation", "Total bad ass" );
    // Create a relationship representing that they know each other
    mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
    // ...create Trinity, Cypher, Agent Smith, Architect similarly
  • 23. Code (1): Building a node space
    GraphDatabaseService graphDb = ... // Get factory
    Transaction tx = neo.beginTx();
    // Create Thomas 'Neo' Anderson
    Node mrAnderson = graphDb.createNode();
    mrAnderson.setProperty( "name", "Thomas Anderson" );
    mrAnderson.setProperty( "age", 29 );
    // Create Morpheus
    Node morpheus = graphDb.createNode();
    morpheus.setProperty( "name", "Morpheus" );
    morpheus.setProperty( "rank", "Captain" );
    morpheus.setProperty( "occupation", "Total bad ass" );
    // Create a relationship representing that they know each other
    mrAnderson.createRelationshipTo( morpheus, RelTypes.KNOWS );
    // ...create Trinity, Cypher, Agent Smith, Architect similarly
    tx.commit();
  • 24. Code (1b): Defining RelationshipTypes
    // In package org.neo4j.graphdb
    public interface RelationshipType
    {
    String name();
    }
    // In package org.yourdomain.yourapp
    // Example on how to roll dynamic RelationshipTypes
    class MyDynamicRelType implements RelationshipType
    {
    private final String name;
    MyDynamicRelType( String name ){ this.name = name; }
    public String name() { return this.name; }
    }
    // Example on how to kick it, static-RelationshipType-like
    enum MyStaticRelTypes implements RelationshipType
    {
    KNOWS,
    WORKS_FOR,
    }
  • 25. Whiteboard friendly
    owns
    Björn
    build
    Big Car
    drives
    DayCare
  • 26.
  • 27. The Graph DB model: traversal
    Traverser framework for
    high-performance traversing
    across the node space
    type = KNOWS
    time = 4 years
    name = “Emil”
    age = 31
    sex = “yes”
    1
    2
    type = car
    vendor = “SAAB”
    model = “95 Aero”
    3
  • 28. Example: Mr Anderson’s friends
    name = “The Architect”
    name = “Morpheus”
    rank = “Captain”
    occupation = “Total badass”
    42
    name = “Thomas Anderson”
    age = 29
    disclosure = public
    KNOWS
    KNOWS
    CODED_BY
    KNOW
    1
    3
    7
    S
    13
    KW
    KN
    S
    name = “Cypher”
    last name = “Reagan”
    OW
    NO
    S
    name = “Agent Smith”
    version = 1.0b
    language = C++
    disclosure = secret
    age = 6 months
    age = 3 days
    2
    name = “Trinity”
  • 29. Code (2): Traversing a node space
    // Instantiate a traverser that returns Mr Anderson's friends
    Traverser friendsTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_GRAPH,
    ReturnableEvaluator.ALL_BUT_START_NODE,
    RelTypes.KNOWS,
    Direction.OUTGOING );
    // Traverse the node space and print out the result
    System.out.println( "Mr Anderson's friends:" );
    for ( Node friend : friendsTraverser )
    {
    System.out.printf( "At depth %d => %s%n",
    friendsTraverser.currentPosition().getDepth(),
    friend.getProperty( "name" ) );
    }
  • 30. name = “The Architect”
    name = “Morpheus”
    rank = “Captain”
    occupation = “Total badass”
    42
    name = “Thomas Anderson”
    age = 29
    disclosure = public
    KNOWS
    KNOWS
    CODED_BY
    KNOW
    1
    7
    3
    S
    13
    KW
    KN
    S
    name = “Cypher”
    last name = “Reagan”
    OW
    NO
    S
    name = “Agent Smith”
    version = 1.0b
    language = C++
    disclosure = secret
    age = 6 months
    age = 3 days
    2
    $ bin/start-neo-example
    Mr Anderson's friends:
    name = “Trinity”
    =>
    =>
    =>
    =>
    Morpheus
    Trinity
    Cypher
    Agent Smith
    depth
    depth
    depth
    depth
    1
    1
    2
    3
    At
    At
    At
    At
    $
    friendsTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_GRAPH,
    ReturnableEvaluator.ALL_BUT_START_NODE,
    RelTypes.KNOWS,
    Direction.OUTGOING );
  • 31. Example: Friends in love?
    name = “The Architect”
    name = “Morpheus”
    rank = “Captain”
    occupation = “Total badass”
    42
    name = “Thomas Anderson”
    age = 29
    disclosure = public
    KNOWS
    KNOWS
    CODED_BY
    1
    KNO
    7
    3
    WS
    13
    KW
    KN
    S
    name = “Cypher”
    last name = “Reagan”
    OW
    NO
    S
    name = “Agent Smith”
    version = 1.0b
    language = C++
    LO
    disclosure = secret
    age = 6 months
    VE
    S
    2
    name = “Trinity”
  • 32. Code (3a): Custom traverser
    // Create a traverser that returns all “friends in love”
    Traverser loveTraverser = mrAnderson.traverse(
    Traverser.Order.BREADTH_FIRST,
    StopEvaluator.END_OF_GRAPH,
    new ReturnableEvaluator()
    {
    public boolean isReturnableNode( TraversalPosition pos )
    {
    return pos.currentNode().hasRelationship(
    RelTypes.LOVES, Direction.OUTGOING );
    }
    },
    RelTypes.KNOWS,
    Direction.OUTGOING );
  • 33. Code (3a): Custom traverser
    // Traverse the node space and print out the result
    System.out.println( "Who’s a lover?" );
    for ( Node person : loveTraverser )
    {
    System.out.printf( "At depth %d => %s%n",
    loveTraverser.currentPosition().getDepth(),
    person.getProperty( "name" ) );
    }
  • 34. name = “The Architect”
    name = “Morpheus”
    rank = “Captain”
    occupation = “Total badass”
    42
    name = “Thomas Anderson”
    age = 29
    disclosure = public
    KNOWS
    KNOWS
    CODED_BY
    KNOW
    1
    7
    3
    S
    13
    KOW
    KN
    S
    name = “Cypher”
    last name = “Reagan”
    OW
    S
    N
    name = “Agent Smith”
    version = 1.0b
    language = C++
    LO
    disclosure = secret
    age = 6 months
    VE
    S
    2
    name = “Trinity”
    $ bin/start-neo-example
    Who’s a lover?
    new ReturnableEvaluator()
    {
    At depth 1 => Trinity
    $
    public boolean isReturnableNode(
    TraversalPosition pos)
    {
    return pos.currentNode().
    hasRelationship( RelTypes.LOVES,
    Direction.OUTGOING );
    }
    },
  • 35. Bonus code: domain model
    How do you implement your domain model?
    Use the delegator pattern, i.e. every domain entity
    wraps a Neo4j primitive:
    // In package org.yourdomain.yourapp
    class PersonImpl implements Person
    {
    private final Node underlyingNode;
    PersonImpl( Node node ){ this.underlyingNode = node; }
    public String getName()
    {
    return (String) this.underlyingNode.getProperty( "name" );
    }
    public void setName( String name )
    {
    this.underlyingNode.setProperty( "name", name );
    }
    }
  • 36. Domain layer frameworks
    Qi4j (www.qi4j.org)
    Framework for doing DDD in pure Java5
    Defines Entities / Associations / Properties
    Sound familiar? Nodes / Rel’s / Properties!
    Neo4j is an “EntityStore” backend
    Jo4neo (http://code.google.com/p/jo4neo)
    Annotation driven
    Weaves Neo4j-backed persistence into domain
    objects at runtime
  • 37. Neo4j system characteristics
    Disk-based
    Native graph storage engine with custom binary
    on-disk format
    Transactional
    JTA/JTS, XA, 2PC, Tx recovery, deadlock
    detection, MVCC, etc
    Scales up
    Many billions of nodes/rels/props on single JVM
    Robust
    6+ years in 24/7 production
  • 38. Social network pathExists()
    12
    ~1k persons
    3
    Avg 50 friends per
    1
    7
    person
    pathExists(a, b) limit
    depth 4
    Two backends
    36
    77
    41
    5
    Eliminate disk IO so
    warm up caches
  • 39. Social network pathExists()
    2
    Emil
    5
    Kevin
    1
    Mike
    7
    John
    3
    Marcus
    4
    Leigh
    # persons query time
    9
    Bruce
    Relational database
    Graph database (Neo4j)
    Graph database (Neo4j)
    1 000
    1 000
    1 000 000
    2 000 ms
    2 ms
    2 ms
  • 40.
  • 41.
  • 42. Pros & Cons compared to RDBMS
    + No O/R impedance mismatch (whiteboard friendly)
    + Can easily evolve schemas
    + Can represent semi-structured info
    + Can represent graphs/networks (with performance)
    - Lacks in tool and framework support
    - Few other implementations => potential lock in
    -+ No support for ad-hoc queries
  • 43. Query languages
    SPARQL – “SQL for linked data”
    Ex:
    ”SELECT ?person WHERE {
    ?person neo4j:KNOWS ?friend .
    ?friend neo4j:KNOWS ?foe .
    ?foe neo4j:name “Larry Ellison” .
    }”
    Gremlin – “perl for graphs”
    Ex:
    ”./outE[@label='KNOWS']/inV[@age > 30]/@name”
  • 44. The Neo4j ecosystem
    Neo4j is an embedded database
    Tiny teeny lil jar file
    Component ecosystem
    index
    meta-model
    graph-matching
    remote-graphdb
    sparql-engine
    ...
    See http://components.neo4j.org
  • 45. Example: Neo4j-RDF
    Neo4j-RDF triple/quad store
    SPARQL
    OWL
    RDF
    Metamodel
    Graph
    match
    Neo4j
  • 46. Language bindings
    Neo4j.py – bindings for Jython and CPython
    http://components.neo4j.org/neo4j.py
    Neo4jrb – bindings for JRuby (incl RESTful API)
    http://wiki.neo4j.org/content/Ruby
    Neo4jrb-simple
    http://github.com/mdeiters/neo4jr-simple
    Clojure
    http://wiki.neo4j.org/content/Clojure
    Scala (incl RESTful API)
    http://wiki.neo4j.org/content/Scala
  • 47.
  • 48.
  • 49. Grails Neoclipse screendump
  • 50. Scale out – replication
    Rolling out Neo4j HA... soon :)
    Master-slave replication, 1st configuration
    MySQL style... ish
    Except all instances can write, synchronously
    between writing slave & master (strong consistency)
    Updates are asynchronously propagated to the
    other slaves (eventual consistency)
    This can handle billions of entities...
    … but not 100B
  • 51. Scale out – partitioning
    Sharding possible today
    … but you have to do manual work
    … just as with MySQL
    Great option: shard on top of resilient, scalable
    OSS app server , see: www.codecauldron.org
    Transparent partitioning? Neo4j 2.0
    100B? Easy to say. Sliiiiightly harder to do.
    Fundamentals: BASE & eventual consistency
    Generic clustering algorithm as base case, but
    give lots of knobs for developers
  • 52. How ego are you? (aka other impls?)
    Franz’ AllegroGraph
    (http://agraph.franz.com)
    Proprietary, Lisp, RDF-oriented but real graphdb
    Sones graphDB
    (http://sones.com)
    Proprietary, .NET, cloud-only, req invite for test
    Kloudshare
    (http://kloudshare.com)
    Graph database in the cloud, still stealth mode
    Google Pregel
    (http://bit.ly/dP9IP)
    We are oh-so-secret
    Some academic papers from ~10 years ago
    G = {V, E}
    #FAIL
  • 53. Conclusion
    Graphs && Neo4j => teh awesome!
    Available NOW under AGPLv3 / commercial license
    AGPLv3: “if you’re open source, we’re open source”
    If you have proprietary software? Must buy a
    commercial license
    But the first one is free!
    Download
    http://neo4j.org
    Feedback
    http://lists.neo4j.org
  • 54. Party pooper slides
  • 55. Poop 1
    Key-value stores?
    => the awesome
    … if you have 1000s of BILLIONS records OR you
    don't care about programmer productivity
    What if you had no variables at all in your programs
    except a single globally accessible hashtable?
    Would your software be maintainable?
  • 56. Poop 2
    In a not-suck architecture...
    … the only thing that makes sense is to have an
    embedded database.
  • 57. Poop 3
    Exposing your data model on the wire is bad.
    Period.
    Adding a couple of buzzwords doesn't make it less
    bad.
    If it was bad with SQL-over-sockets (hint: it was)
    then – surprise! – it's still bad even tho you use
    Hype-compliant(tm) JSON-over-REST.
    We don't want to couple everything to a specific
    data model or schema again!
  • 58. Poop 4
    In-memory database
    What the hell?
    That's an oxymoron!
    Up next: ascii-only JPEG
    Up next: loopback-only web server
    If you're not durable, you're a variable!
    If you happen to asynchronously spill over to disk,
    you're a variable that asynchronously spills over to
    disk
  • 59. Ait
    so, srsly?
  • 60. Looking ahead: polyglot persistence
    NoSQL
    SQL
    &&
  • 61.
  • 62. Questions?
    Image credit: lost again! Sorry :(
  • 63. http://neotechnology.com