Neo4j
   High Availability
  New Auto-Cluster

Michael Hunger - @mesirii
                            1
High Availability Cluster
  ๏Neo4j Enterprise
  ๏Master-Slave Replication
  ๏read-scaling and fault-tolerance
  ๏eventual consistency
    • write to master (push_factor)

    • write to slaves
                                      2
3 Separate Concerns (I)
๏Cluster Management
  •   Members join/leave/heartbeat
๏Failover
  •   Master Election

  • Distribution of Master-Status


                                     3
3 Separate Concerns (II)
๏Replication
  •synchronized id-generation

  • distributed locks

  • pull, push of transactions

  • initial store synchronization


                                    4
Pre 1.9 - Zookeeper


                  5
Pre 1.9
๏Apache Zookeeper took care of concerns
  •   Cluster Management
      ‣new members register with ZK
  •   Failover
      ‣ZK stores Master and last TX-Id
      ‣ZK uses ZAB to determine new Master
       and distribute information
                                         6
HA Cluster

Coordinator              RO-                Coordinator
                         Slave




                       Master

              Slave                 Slave




                      Coordinator



                                                     7
Pre 1.9 - Problems
๏Additional setup and operations of a separate
   component

๏unreliable operation / hiccups
๏longterm stability
๏no dynamic reconfig of the ZK cluster
   important for cloud setup

                                         8
Post 1.9 -
Neo4j Auto Cluster


                 9
Replace Zookeeper!?
๏Implement Multi-Paxos ourselves
๏simple, testable code
๏only covers
  • cluster management,

  • master election


                                   10
HA Cluster




             11
What is Paxos?
๏reliable consensus making
๏broadcasting
๏works even with unreliable communication
  •message lost

  • delays, invalid order
๏does not guarantee progress
                                       12
What is Paxos?




                 13
Implementation
๏everything is a State Machines
  • SM = stateless enums + context

  • Message = type enum + payload

  • State = enum instance

  • switch on msg-type, implement logic
    Transition = handle() messages,


                                          14
Implementation (II)
๏everything is a State Machines
  •   use timeouts for reliability

  • handle failing messages

  • decouple network and time
      ‣for testability
  •   listeners interact on messages with
        outside world, sync or async        15
Implementation (II)
๏Paxos (3 roles)                   Acceptor



  •   Proposer-SM                 Paxos

  • Acceptor-SM
                       Proposer                  Learner




  • Learner-SM                    ClusterState


๏Cluster
  •
                    Heartbeat
      Heartbeat
                                                           16
Multi-Paxos (happy path)
                                                              Acceptor
              Learner              Proposer
                                                              (2 * f + 1)

                         PREPARE


                                                    PREPARE

                                              TIMEOUT

                                                                       VALUE
                                                   PROMISE             MATCH
                                                     OR
                                                   REJECT            NO MATCH



                                                    ACCEPT
                                                                      MATCHES
                                                  TIMEOUT
                                                                      PROMISE?

                            CHECK ,                                  STORE
                            STORE                  ACCEPTED
                                                                     VALUE
                          RESPONSES                   OR
                           IF QUORUM               REJECTED            NO
                          MET, CANCEL
                             TIMEOUT
       STORE




                                    ...
       VALUE               LEARN
      OUT OF
      ORDER
       MSG
     HANDLING
                                         other
      DELIVER       A VALUE IS          Learner
     ALL VALID       MISSING

  ATOMIC BC
                        LEARN TIMEOUT
    WE STILL
                                                                                 17
                        LEARN TIMEOUT
     DON'T
     KNOW
TIMEOUT




Multi-Paxos (happy path)                        PROMISE



                                                ACCEPT




         ...
                                                           MATCHES
                                              TIMEOUT
                                                           PROMISE?

                       CHECK ,                             STORE
                       STORE                    ACCEPTED
                                                           VALUE
                     RESPONSES                        OR
                     IF QUORUM                  REJECTED    NO
                    MET, CANCEL
                       TIMEOUT
      STORE
      VALUE          LEARN
     OUT OF
     ORDER
      MSG
    HANDLING
                                     other
     DELIVER    A VALUE IS          Learner
    ALL VALID    MISSING

 ATOMIC BC
                 LEARN TIMEOUT
   WE STILL        LEARN TIMEOUT
    DON'T
    KNOW            LEARN REQ
                 LEARN TIMEOUT

                                               HAVE
                        LEARN
                                              VALUE
                             OR
                       LEARN FAIL         DON'T
                                          KNOW




                                                                      18
Acceptor State Machine




                         19
Heartbeat State Machine




                          20
Implementation (III)
๏HA Implementation uses state machines as
   infrastructure

๏notifications via listeners
๏piggyback heartbeat on messages
๏master election
  • (all - failed) have to agree

  • Paxos BC needs quorum of total     21
Multi-Paxos
๏everything is a State Machines
  •   use timeouts for reliability

  • handle failing messages

  • decouple network and time
      ‣for testability
  •   listeners interact on messages with
        outside world, sync or async        22
Unit-Testing

•   Mock Time
    ‣fast running tests despite timeouts
•   Mock Network
    ‣simulate delays, failing messages




                                           23
Unit-Test-Example




                    24
Setup   •Config

        • Video

        • Auto-Setup Script (Demo)




                                     25
Thank You - Questions?



                         26

New Neo4j Auto HA Cluster

  • 1.
    Neo4j High Availability New Auto-Cluster Michael Hunger - @mesirii 1
  • 2.
    High Availability Cluster ๏Neo4j Enterprise ๏Master-Slave Replication ๏read-scaling and fault-tolerance ๏eventual consistency • write to master (push_factor) • write to slaves 2
  • 3.
    3 Separate Concerns(I) ๏Cluster Management • Members join/leave/heartbeat ๏Failover • Master Election • Distribution of Master-Status 3
  • 4.
    3 Separate Concerns(II) ๏Replication •synchronized id-generation • distributed locks • pull, push of transactions • initial store synchronization 4
  • 5.
    Pre 1.9 -Zookeeper 5
  • 6.
    Pre 1.9 ๏Apache Zookeepertook care of concerns • Cluster Management ‣new members register with ZK • Failover ‣ZK stores Master and last TX-Id ‣ZK uses ZAB to determine new Master and distribute information 6
  • 7.
    HA Cluster Coordinator RO- Coordinator Slave Master Slave Slave Coordinator 7
  • 8.
    Pre 1.9 -Problems ๏Additional setup and operations of a separate component ๏unreliable operation / hiccups ๏longterm stability ๏no dynamic reconfig of the ZK cluster important for cloud setup 8
  • 9.
    Post 1.9 - Neo4jAuto Cluster 9
  • 10.
    Replace Zookeeper!? ๏Implement Multi-Paxosourselves ๏simple, testable code ๏only covers • cluster management, • master election 10
  • 11.
  • 12.
    What is Paxos? ๏reliableconsensus making ๏broadcasting ๏works even with unreliable communication •message lost • delays, invalid order ๏does not guarantee progress 12
  • 13.
  • 14.
    Implementation ๏everything is aState Machines • SM = stateless enums + context • Message = type enum + payload • State = enum instance • switch on msg-type, implement logic Transition = handle() messages, 14
  • 15.
    Implementation (II) ๏everything isa State Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 15
  • 16.
    Implementation (II) ๏Paxos (3roles) Acceptor • Proposer-SM Paxos • Acceptor-SM Proposer Learner • Learner-SM ClusterState ๏Cluster • Heartbeat Heartbeat 16
  • 17.
    Multi-Paxos (happy path) Acceptor Learner Proposer (2 * f + 1) PREPARE PREPARE TIMEOUT VALUE PROMISE MATCH OR REJECT NO MATCH ACCEPT MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE ... VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL 17 LEARN TIMEOUT DON'T KNOW
  • 18.
    TIMEOUT Multi-Paxos (happy path) PROMISE ACCEPT ... MATCHES TIMEOUT PROMISE? CHECK , STORE STORE ACCEPTED VALUE RESPONSES OR IF QUORUM REJECTED NO MET, CANCEL TIMEOUT STORE VALUE LEARN OUT OF ORDER MSG HANDLING other DELIVER A VALUE IS Learner ALL VALID MISSING ATOMIC BC LEARN TIMEOUT WE STILL LEARN TIMEOUT DON'T KNOW LEARN REQ LEARN TIMEOUT HAVE LEARN VALUE OR LEARN FAIL DON'T KNOW 18
  • 19.
  • 20.
  • 21.
    Implementation (III) ๏HA Implementationuses state machines as infrastructure ๏notifications via listeners ๏piggyback heartbeat on messages ๏master election • (all - failed) have to agree • Paxos BC needs quorum of total 21
  • 22.
    Multi-Paxos ๏everything is aState Machines • use timeouts for reliability • handle failing messages • decouple network and time ‣for testability • listeners interact on messages with outside world, sync or async 22
  • 23.
    Unit-Testing • Mock Time ‣fast running tests despite timeouts • Mock Network ‣simulate delays, failing messages 23
  • 24.
  • 25.
    Setup •Config • Video • Auto-Setup Script (Demo) 25
  • 26.
    Thank You -Questions? 26