SlideShare a Scribd company logo
1 of 42
Download to read offline
An overview of
                       Neo4j Internals


                                   tobias@neotechnology.com
 Tobias Lindaaker                  twitter: @thobe, #neo4j (@neo4j)
                                   web: neo4j.org neotechnology.com
 Hacker @ Neo Technology           my web: thobe.org




Monday, May 21, 2012
Outline
       This is a rough structure of
       how the pieces of Neo4j fit
       together.

       This talk will not cover
       how disks/fs works, we
       just assume it does.           Traversals      Core API       Cypher
       Nor will it cover the “Core
       API”, you are assumed to
       know it.
                                      Node/Relationship
                                                            Thread local diffs
                                        Object cache

                                          FS Cache                         HA


                                        Record files          Transaction log


                                                      Disk(s)



                                                                                 2

Monday, May 21, 2012
Outline

                             Traversals      Core API       Cypher


                             Node/Relationship
                                                   Thread local diffs
                               Object cache

                                 FS Cache                         HA

      Let’s start at the
      bottom: the on disk      Record files          Transaction log
      storage file layout.



                                             Disk(s)



                                                                        3

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                linked lists of fixed size records on disk.

       Your graph on disk                       Properties are stored as a linked list of
                                                property records, each holding key+value.
                                                Each node/relationship references its first
                                                property record.
                                                The Nodes also reference the first node in its
                                                relationship chain.
                                                Each Relationship references its start and
                   Name: Alistair               end node.
                   Age: 34             KNOWS    It also references the prev/next relationship
                                                record for the start/end node respectively


                                                                   Name: Tobias
                                                                   Age: 27
                                                                   Nationality: Swedish



            KNOWS
                                        KNOWS


                                                                      KNOWS
              Name: Ian
              Age: 42


                                                           Name: Jim
                                                           Age: 37
                                    KNOWS                  Stuff: good

                                                                                                 4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                         linked lists of fixed size records on disk.

       Your graph on disk                Properties are stored as a linked list of
                                         property records, each holding key+value.
                                         Each node/relationship references its first
        Name                             property record.
                                         The Nodes also reference the first node in its
       Alistair                          relationship chain.
                                         Each Relationship references its start and
                                         end node.                                        Name
                                KNOWS    It also references the prev/next relationship
                                         record for the start/end node respectively       Tobias
      Age
       34
                                                                                                        Age
                                                                                                            27


                                                                                          Nationality

             KNOWS                                                                        Swedish
                                 KNOWS


                                                               KNOWS

                                                                               Name
                                                                                 Jim
                                                                                                            Age
     Name                                                                                                   37
                                                                                 Stuff
       Ian                   KNOWS
                       Age                                                      good

                       42                                                                               4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                         linked lists of fixed size records on disk.

       Your graph on disk                Properties are stored as a linked list of
                                         property records, each holding key+value.
                                         Each node/relationship references its first
        Name                             property record.
                                         The Nodes also reference the first node in its
       Alistair                          relationship chain.
                                         Each Relationship references its start and
                                         end node.                                        Name
                                KNOWS    It also references the prev/next relationship
                                         record for the start/end node respectively       Tobias
      Age
       34
                                                                                                        Age
                                                                                                            27


                                                                                          Nationality

             KNOWS                                                                        Swedish
                                 KNOWS


                                                               KNOWS

                                                                               Name
                                                                                 Jim
                                                                                                            Age
     Name                                                                                                   37
                                                                                 Stuff
       Ian                   KNOWS
                       Age                                                      good

                       42                                                                               4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Store files
                       ๏Node store
                       ๏Relationship store
                         •   Relationship type store
                       ๏Property store
                         •   Property key store

                         • (long) String store    Short string and
                                                  array values are
                                                  inlined in the




                         •
                                                  property store, long
                                                  values are stored in

                           (long) Array store     separate store files.


                                                                          5

Monday, May 21, 2012
Neo4j Storage Record Layout
Node (9 bytes)
inUse nextRelId                          nextPropId


     1                               5                9



Relationship (33 bytes)
inUse firstNode                          secondNode       relationshipType    firstPrevRelId    firstNextRelId    secondPrevRelId    secondNextRelId    nextPropId


     1                               5                9                      13                17                21                 25                 29            33



Relationship Type (5 bytes)
inUse typeBlockId


     1                               5



Property (33 bytes)
inUse type              keyIndexId       propBlock                                                                                                      nextPropId


     1              3                5                                                                                                                 29            33



Property Index (9 bytes)
inUse propCount                          keyBlockId


     1                               5                9



Dynamic Store (125 bytes)
inUse next                               data


     1                               5



NeoStore (5 bytes)
inUse datum


     1                               5

Monday, May 21, 2012
Outline

                                      Traversals      Core API       Cypher


     Next: The t wo levels of cache   Node/Relationship
     in Neo4j.                                              Thread local diffs
     The low level FS Cache for the     Object cache
     record files.
     And the high level Object
     cache storing a structure
     more optimized for traversal.
                                          FS Cache                         HA


                                        Record files          Transaction log


                                                      Disk(s)



                                                                                 7

Monday, May 21, 2012
The caches
        ๏ Filesystem cache:
                • Caches regions store file intofiles sized regions)
                    (divides each
                                   of the store
                                                  equally

                • The cache holds a fixed number of regions for each file
                • Regions are evicted based on ahit in non-cached region)
                    (hit count vs. miss count, i.e.
                                                    LFU-like policy


                • Default implementation of regions uses OS mmap
        ๏ Node/Relationship cache
                • Cache a version more optimized for traversals
                                                                            8

Monday, May 21, 2012
What we put in cache
                       ID             Relationship ID refs
                                      in:    R1       R2    ...   Rn               The structure of the elements in the high level
                       type 1                                                      object cache.
                                      out    R1       R2    ...   Rn
                                                                                   On disk most of the information is contained
                                      in:    R1       R2    R3     ...    Rn       in the relationship records, with the nodes just
                       type 2                                                      referencing their first relationship. In the
                                      out    R1       ...   Rn
              Node                                                                 cache this is turned around: the nodes hold
                                                                                   references to all its relationships. The
                        ...            (grouped by type)                           relationships are simple, only holding its
                                                                                   properties.

                                                                                   The relationships for each node is grouped by
                                                                                   RelationshipType to allow fast traversal of a
                                                                                   specific type.
                       Key 1                Key 2           ...          Key n
                                                                                   All references (dotted arrows) are by ID, and
                                                                                   traversals do indirect lookup through the cache.
                              Val 1



                                              Val 2




                       ID         start               end         type     Val n

        Relationship
                       Key 1                Key 2           ...          Key n
                              Val 1



                                              Val 2




                                                                           Val n




                                                                                                                       9

Monday, May 21, 2012
Outline

     So how do traversals work...
                                    Traversals      Core API       Cypher


                                    Node/Relationship
                                                          Thread local diffs
                                      Object cache

                                        FS Cache                         HA


                                      Record files          Transaction log


                                                    Disk(s)



                                                                               10

Monday, May 21, 2012
Traversals - how do they work?
       ๏ RelationshipExpanders: given (a path to) a node, returns           The surface
                                                                            layer, the you
                 Relationships to continue traversing from that node        interact with.


       ๏ Evaluators: given (a path to) a node, returns whether to:
                • Continue traversing on that branch (i.e. expand) or not
                • Include (the path to) the node in the result set or not
       ๏ Then a projection to Path, Node, or Relationship applied to
                 each Path in the result set.
            ... but also:
       ๏ Uniqueness level: policy for when it is ok to revisit a node
                 that has already been visited
       ๏ Implemented on top of the Core API
                                                                            11

Monday, May 21, 2012
More on Traversals
        ๏ Fetch node data from cache - non-blocking access                  This is what happens


                •
                                                                            under the hood.
                       If not in cache, retrieve from storage, into cache
                       ‣If region is in FS cache: blocking but short duration access
                       ‣If region is outside FS cache: blocking slower access
        ๏ Get relationships from cached node
                • If not fetched, retrieve from storage, by following chains
        ๏ Expand relationship(s) to end up on next node(s)
                • The relationship knows the node, no need to fetch it yet
        ๏ Evaluate
                •      possibly emitting a Path into the result set
        ๏ Repeat                                                                 12

Monday, May 21, 2012
Outline

                                                                  How is Cypher different?
                       Traversals      Core API       Cypher      and how dowes it work?



                       Node/Relationship
                                             Thread local diffs
                         Object cache

                           FS Cache                         HA


                         Record files          Transaction log


                                       Disk(s)



                                                                              13

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
What about gremlin?
        ๏ gremlin is a third party language, built by Marko Rodriguez of
                   Tinkerpop (a group of people who like to hack on graphs)
        ๏ Originally based on the idea of using xpath to describe traversals:
                   ./HAS_CART/CONTAINS_ITEM/PURCHASED/PURCHASED
                   but bastardized to distinguish between nodes and relationships:
                   ./outE[label=HAS_CART]/inV                     Traversals are close
                    /outE[label=CONTAINS_ITEM]/inV to xpath, which is
                                                                  why xpath like
                    /inE[label=PURCHASED]/outV                    descriptions of
                                                                  traversals seemed
                    /outE[label=PURCHASED]/inV                    like a good idea.

        ๏ xpath is not complete enough to express full algorithms, it needs a
                   host language, gremlin originally defined its own.
                   This changed Groovy as a more complete host language and
                   abandoned xpath in favor of method chaining
                   [ replace ‘/’ with ‘.’ ]                              15

Monday, May 21, 2012
Gremlin compared to Cypher
        ๏start me=node:people(name={myname})
              match me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item
              item<-[:PURCHASED]-user-[:PURCHASED]->recommendation
              return recommendation

        ๏ Cypher is declarative, describes what data to get - its shape
        ๏ Gremlin is imperative, prescribes how to get the data
        ๏ Cypher has more opportunities for optimization by the engine
        ๏ Gremlin can implement pagerank, Cypher can’t (yet?)



                                                                     16

Monday, May 21, 2012
Outline

                       Traversals      Core API       Cypher      Transactions involve t wo
                                                                  parts:
                                                                  The (thread local) changes
                                                                  being done by an active
                                                                  transaction,
                       Node/Relationship                          and the transaction replay log
                                             Thread local diffs
                         Object cache                             for recovery.



                           FS Cache                         HA


                         Record files          Transaction log


                                       Disk(s)



                                                                               17

Monday, May 21, 2012
Transaction Isolation
        ๏ Mutating operations are not written when performed
        ๏ They are stored in a thread confined transaction state object
        ๏ This prevents other threads from seeing uncommitted changes
                   from the transactions of other threads
        ๏ When Transaction.finish() is invoked the transaction is either
                   committed or rolled back
        ๏ Rollback is simple: discard the transaction state object



                                                                          18

Monday, May 21, 2012
Transactions & Durability
        ๏ Commit is:
                • Changes made in the transaction are collected as commands
                • Commands are sorted to get predictable update order
                       ‣This prevents concurrent readers from seeing inconsistent
                        data when the changes are applied to the store
                • Write changes (in sorted order) to the transaction log
                • Mark the transaction as committed in the log
                • Apply the changes (in sorted order) to the store files
                                                                              19

Monday, May 21, 2012
Recovery
        ๏ Transaction commands dictate state, they don’t modify state
                • i.e. SET property "count" to 5
                • rather than ADD 1 to property "count"
        ๏ Thus: Applying the same command twice yields the same state
        ๏ Recovery simply replays all transactions since the last safe point
        ๏ If tx A mutates node1.name, then tx B also mutates
                   node1.name that doesn’t matter, because the database is not
                   recovered until all transactions have been replayed


                                                                         20

Monday, May 21, 2012
Outline

                       Traversals      Core API       Cypher


                       Node/Relationship
                                             Thread local diffs
                         Object cache

                           FS Cache                         HA


                         Record files          Transaction log     High Availability in
                                                                  Neo4j builds on top of
                                                                  the transaction replay


                                       Disk(s)



                                                                              21

Monday, May 21, 2012
Outline                       The transaction logs are
                                                                          shared bet ween all instances
                                                                          in an High Availability setup,
                                                                          all other parts operate on the
                                                                          local data just like in the
                                                                          standalone case.

                               Traversals      Core API       Cypher


                       Local   Node/Relationship
                                                     Thread local diffs
                                 Object cache

                                   FS Cache                         HA


                                 Record files          Transaction log     Shared

                                               Disk(s)



                                                                                       22

Monday, May 21, 2012
• HA - the parts to it:
        ๏ Based on streaming transactions between servers
        ๏ All transactions are committed through the master
                • Then (eventually) applied to the slaves
                • Eventuality synchronizationupdate intervalby interaction
                    or when
                              defined by the
                                              is mandated
        ๏ When writing to a slave:
                • Locks coordinated through the master
                • Transaction data buffered on theget a txid
                    applied first on the master to
                                                   slave

                       then applied with the same txid on the slave
                                                                             23

Monday, May 21, 2012
Creating new Nodes and Relationships
        ๏ New Nodes/Relationships don’t need locks, so they don’t need a
                   transaction synced with master until the transaction commit
        ๏ They do need an ID that is unique and equal among all instances
        ๏ Each instance allocates IDs in blocks from the master, then assigns
                   them to new Nodes/Relationships locally

                • This batch allocation can be seen in (Enterprise) 1000 as
                    Node/Relationship counts jumping in steps of
                                                                    WebAdmin




                                                                           24

Monday, May 21, 2012
HA synchronization points
        ๏ Transactions are uniquely identified by monotonically increasing ID
        ๏ All Requests from slave send the current latest txid on that slave
        ๏ Responses from master send back a stream of transactions that
                   have occurred since then, along with the actual result
        ๏ Transaction is replayed just like when committed / recovered
        ๏ Nodes/Relationships touched during this application are evicted
                   from cache to avoid cache staleness
        ๏ Transaction commands are only sorted when created, before
                   stored/transmitted, thus consistency is preserved during all
                   application phases
                                                                              25

Monday, May 21, 2012
Locking semantics of HA
        ๏ To be granted a lock the slave must have the latest version of the
                   Node/Relationship it is trying to lock

                • This ensures consistency
                • The implementation of “Latest version ofentireNode/
                    Relationship” is “Latest version of the
                                                            the
                                                                 graph”

                • The slave must thus sync transactions from the master


                                                                          26

Monday, May 21, 2012
Master election
        ๏ Each instance communicates/coordinates:
              • its latest transaction id (including the master id for that tx)
              • the id forclock value for when the txid was written
                            that instance
              • (logical)
        ๏    Election chooses:
           1. The instance with highest txid
           2. IF multiple: The instance that was master for that tx
           3. IF unavailable: The instance with the lowest clock value
           4. IF multiple: The instance with the lowest id
        ๏ Election happens when the current master cannot be reached
                •
              Any instance can choose to re-elect
                •
              Each instance runs the election protocol individually
                •
              Notify others when election chooses new master
        ๏ When elected, the new master broadcasts to all instances, 27
             forcing them to bind to the new master
Monday, May 21, 2012
Thank you for listening!

                                    tobias@neotechnology.com
 Tobias Lindaaker                   twitter: @thobe, #neo4j (@neo4j)
                                    web: neo4j.org neotechnology.com
 Hacker @ Neo Technology            my web: thobe.org




Monday, May 21, 2012

More Related Content

What's hot

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insightsKirill Loifman
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphNeo4j
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsAlexander Korotkov
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceNeo4j
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Edureka!
 

What's hot (20)

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insights
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 

More from Tobias Lindaaker

Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph DatabaseTobias Lindaaker
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL databaseTobias Lindaaker
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent ProgrammingTobias Lindaaker
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jTobias Lindaaker
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesTobias Lindaaker
 

More from Tobias Lindaaker (10)

NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
JDK Power Tools
JDK Power ToolsJDK Power Tools
JDK Power Tools
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic Languages
 

Recently uploaded

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

An overview of Neo4j Internals

  • 1. An overview of Neo4j Internals tobias@neotechnology.com Tobias Lindaaker twitter: @thobe, #neo4j (@neo4j) web: neo4j.org neotechnology.com Hacker @ Neo Technology my web: thobe.org Monday, May 21, 2012
  • 2. Outline This is a rough structure of how the pieces of Neo4j fit together. This talk will not cover how disks/fs works, we just assume it does. Traversals Core API Cypher Nor will it cover the “Core API”, you are assumed to know it. Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 2 Monday, May 21, 2012
  • 3. Outline Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Let’s start at the bottom: the on disk Record files Transaction log storage file layout. Disk(s) 3 Monday, May 21, 2012
  • 4. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first property record. The Nodes also reference the first node in its relationship chain. Each Relationship references its start and Name: Alistair end node. Age: 34 KNOWS It also references the prev/next relationship record for the start/end node respectively Name: Tobias Age: 27 Nationality: Swedish KNOWS KNOWS KNOWS Name: Ian Age: 42 Name: Jim Age: 37 KNOWS Stuff: good 4 Monday, May 21, 2012
  • 5. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name property record. The Nodes also reference the first node in its Alistair relationship chain. Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 Nationality KNOWS Swedish KNOWS KNOWS Name Jim Age Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 6. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name property record. The Nodes also reference the first node in its Alistair relationship chain. Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 Nationality KNOWS Swedish KNOWS KNOWS Name Jim Age Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 7. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 8. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 9. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 10. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 11. Store files ๏Node store ๏Relationship store • Relationship type store ๏Property store • Property key store • (long) String store Short string and array values are inlined in the • property store, long values are stored in (long) Array store separate store files. 5 Monday, May 21, 2012
  • 12. Neo4j Storage Record Layout Node (9 bytes) inUse nextRelId nextPropId 1 5 9 Relationship (33 bytes) inUse firstNode secondNode relationshipType firstPrevRelId firstNextRelId secondPrevRelId secondNextRelId nextPropId 1 5 9 13 17 21 25 29 33 Relationship Type (5 bytes) inUse typeBlockId 1 5 Property (33 bytes) inUse type keyIndexId propBlock nextPropId 1 3 5 29 33 Property Index (9 bytes) inUse propCount keyBlockId 1 5 9 Dynamic Store (125 bytes) inUse next data 1 5 NeoStore (5 bytes) inUse datum 1 5 Monday, May 21, 2012
  • 13. Outline Traversals Core API Cypher Next: The t wo levels of cache Node/Relationship in Neo4j. Thread local diffs The low level FS Cache for the Object cache record files. And the high level Object cache storing a structure more optimized for traversal. FS Cache HA Record files Transaction log Disk(s) 7 Monday, May 21, 2012
  • 14. The caches ๏ Filesystem cache: • Caches regions store file intofiles sized regions) (divides each of the store equally • The cache holds a fixed number of regions for each file • Regions are evicted based on ahit in non-cached region) (hit count vs. miss count, i.e. LFU-like policy • Default implementation of regions uses OS mmap ๏ Node/Relationship cache • Cache a version more optimized for traversals 8 Monday, May 21, 2012
  • 15. What we put in cache ID Relationship ID refs in: R1 R2 ... Rn The structure of the elements in the high level type 1 object cache. out R1 R2 ... Rn On disk most of the information is contained in: R1 R2 R3 ... Rn in the relationship records, with the nodes just type 2 referencing their first relationship. In the out R1 ... Rn Node cache this is turned around: the nodes hold references to all its relationships. The ... (grouped by type) relationships are simple, only holding its properties. The relationships for each node is grouped by RelationshipType to allow fast traversal of a specific type. Key 1 Key 2 ... Key n All references (dotted arrows) are by ID, and traversals do indirect lookup through the cache. Val 1 Val 2 ID start end type Val n Relationship Key 1 Key 2 ... Key n Val 1 Val 2 Val n 9 Monday, May 21, 2012
  • 16. Outline So how do traversals work... Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 10 Monday, May 21, 2012
  • 17. Traversals - how do they work? ๏ RelationshipExpanders: given (a path to) a node, returns The surface layer, the you Relationships to continue traversing from that node interact with. ๏ Evaluators: given (a path to) a node, returns whether to: • Continue traversing on that branch (i.e. expand) or not • Include (the path to) the node in the result set or not ๏ Then a projection to Path, Node, or Relationship applied to each Path in the result set. ... but also: ๏ Uniqueness level: policy for when it is ok to revisit a node that has already been visited ๏ Implemented on top of the Core API 11 Monday, May 21, 2012
  • 18. More on Traversals ๏ Fetch node data from cache - non-blocking access This is what happens • under the hood. If not in cache, retrieve from storage, into cache ‣If region is in FS cache: blocking but short duration access ‣If region is outside FS cache: blocking slower access ๏ Get relationships from cached node • If not fetched, retrieve from storage, by following chains ๏ Expand relationship(s) to end up on next node(s) • The relationship knows the node, no need to fetch it yet ๏ Evaluate • possibly emitting a Path into the result set ๏ Repeat 12 Monday, May 21, 2012
  • 19. Outline How is Cypher different? Traversals Core API Cypher and how dowes it work? Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 13 Monday, May 21, 2012
  • 20. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 21. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 22. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 23. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 24. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 25. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 26. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 27. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 28. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 29. What about gremlin? ๏ gremlin is a third party language, built by Marko Rodriguez of Tinkerpop (a group of people who like to hack on graphs) ๏ Originally based on the idea of using xpath to describe traversals: ./HAS_CART/CONTAINS_ITEM/PURCHASED/PURCHASED but bastardized to distinguish between nodes and relationships: ./outE[label=HAS_CART]/inV Traversals are close /outE[label=CONTAINS_ITEM]/inV to xpath, which is why xpath like /inE[label=PURCHASED]/outV descriptions of traversals seemed /outE[label=PURCHASED]/inV like a good idea. ๏ xpath is not complete enough to express full algorithms, it needs a host language, gremlin originally defined its own. This changed Groovy as a more complete host language and abandoned xpath in favor of method chaining [ replace ‘/’ with ‘.’ ] 15 Monday, May 21, 2012
  • 30. Gremlin compared to Cypher ๏start me=node:people(name={myname}) match me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item item<-[:PURCHASED]-user-[:PURCHASED]->recommendation return recommendation ๏ Cypher is declarative, describes what data to get - its shape ๏ Gremlin is imperative, prescribes how to get the data ๏ Cypher has more opportunities for optimization by the engine ๏ Gremlin can implement pagerank, Cypher can’t (yet?) 16 Monday, May 21, 2012
  • 31. Outline Traversals Core API Cypher Transactions involve t wo parts: The (thread local) changes being done by an active transaction, Node/Relationship and the transaction replay log Thread local diffs Object cache for recovery. FS Cache HA Record files Transaction log Disk(s) 17 Monday, May 21, 2012
  • 32. Transaction Isolation ๏ Mutating operations are not written when performed ๏ They are stored in a thread confined transaction state object ๏ This prevents other threads from seeing uncommitted changes from the transactions of other threads ๏ When Transaction.finish() is invoked the transaction is either committed or rolled back ๏ Rollback is simple: discard the transaction state object 18 Monday, May 21, 2012
  • 33. Transactions & Durability ๏ Commit is: • Changes made in the transaction are collected as commands • Commands are sorted to get predictable update order ‣This prevents concurrent readers from seeing inconsistent data when the changes are applied to the store • Write changes (in sorted order) to the transaction log • Mark the transaction as committed in the log • Apply the changes (in sorted order) to the store files 19 Monday, May 21, 2012
  • 34. Recovery ๏ Transaction commands dictate state, they don’t modify state • i.e. SET property "count" to 5 • rather than ADD 1 to property "count" ๏ Thus: Applying the same command twice yields the same state ๏ Recovery simply replays all transactions since the last safe point ๏ If tx A mutates node1.name, then tx B also mutates node1.name that doesn’t matter, because the database is not recovered until all transactions have been replayed 20 Monday, May 21, 2012
  • 35. Outline Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log High Availability in Neo4j builds on top of the transaction replay Disk(s) 21 Monday, May 21, 2012
  • 36. Outline The transaction logs are shared bet ween all instances in an High Availability setup, all other parts operate on the local data just like in the standalone case. Traversals Core API Cypher Local Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Shared Disk(s) 22 Monday, May 21, 2012
  • 37. • HA - the parts to it: ๏ Based on streaming transactions between servers ๏ All transactions are committed through the master • Then (eventually) applied to the slaves • Eventuality synchronizationupdate intervalby interaction or when defined by the is mandated ๏ When writing to a slave: • Locks coordinated through the master • Transaction data buffered on theget a txid applied first on the master to slave then applied with the same txid on the slave 23 Monday, May 21, 2012
  • 38. Creating new Nodes and Relationships ๏ New Nodes/Relationships don’t need locks, so they don’t need a transaction synced with master until the transaction commit ๏ They do need an ID that is unique and equal among all instances ๏ Each instance allocates IDs in blocks from the master, then assigns them to new Nodes/Relationships locally • This batch allocation can be seen in (Enterprise) 1000 as Node/Relationship counts jumping in steps of WebAdmin 24 Monday, May 21, 2012
  • 39. HA synchronization points ๏ Transactions are uniquely identified by monotonically increasing ID ๏ All Requests from slave send the current latest txid on that slave ๏ Responses from master send back a stream of transactions that have occurred since then, along with the actual result ๏ Transaction is replayed just like when committed / recovered ๏ Nodes/Relationships touched during this application are evicted from cache to avoid cache staleness ๏ Transaction commands are only sorted when created, before stored/transmitted, thus consistency is preserved during all application phases 25 Monday, May 21, 2012
  • 40. Locking semantics of HA ๏ To be granted a lock the slave must have the latest version of the Node/Relationship it is trying to lock • This ensures consistency • The implementation of “Latest version ofentireNode/ Relationship” is “Latest version of the the graph” • The slave must thus sync transactions from the master 26 Monday, May 21, 2012
  • 41. Master election ๏ Each instance communicates/coordinates: • its latest transaction id (including the master id for that tx) • the id forclock value for when the txid was written that instance • (logical) ๏ Election chooses: 1. The instance with highest txid 2. IF multiple: The instance that was master for that tx 3. IF unavailable: The instance with the lowest clock value 4. IF multiple: The instance with the lowest id ๏ Election happens when the current master cannot be reached • Any instance can choose to re-elect • Each instance runs the election protocol individually • Notify others when election chooses new master ๏ When elected, the new master broadcasts to all instances, 27 forcing them to bind to the new master Monday, May 21, 2012
  • 42. Thank you for listening! tobias@neotechnology.com Tobias Lindaaker twitter: @thobe, #neo4j (@neo4j) web: neo4j.org neotechnology.com Hacker @ Neo Technology my web: thobe.org Monday, May 21, 2012