Curator
The Netflix ZooKeeper Client Library




                           Jordan Zimmerman
                                   Senior Platform Engineer
                                                 Netflix, Inc.
                                 jzimmerman@netflix.com
                                                  @rangalt
Agenda
Agenda
• Background
• Overview of Curator
• The Recipes
• Some Low-Level Details
• Q&A
Background
What’s wrong with this
        code?


ZooKeeper       client = new ZooKeeper(...);

client.create(“/foo”, data, ...);
ZooKeeper Surprise
ZooKeeper Surprise
• Almost no ZK client call is safe
• You cannot assume success
• You must handle exceptions
The Recipes Are Hard
       Locks
Fully distributed locks that are globally synchronous, meaning at any snapshot in time no two clients think they hold the same lock. These can be
implemented using ZooKeeeper. As with priority queues, first define a lock node.

Note
There now exists a Lock implementation in ZooKeeper recipes directory. This is distributed with the release -- src/recipes/lock directory of the release artifact.

Clients wishing to obtain a lock do the following:

  1.  Call create( ) with a pathname of "_locknode_/guid-lock-" and the sequence and ephemeral flags set. The guid is needed in case the
      create() result is missed. See the note below.
  2. Call getChildren( ) on the lock node without setting the watch flag (this is important to avoid the herd effect).
  3. If the pathname created in step 1 has the lowest sequence number suffix, the client has the lock and the client exits the protocol.
  4. The client calls exists( ) with the watch flag set on the path in the lock directory with the next lowest sequence number.
  5. if exists( ) returns false, go to step 2. Otherwise, wait for a notification for the pathname from the previous step before going to step 2.
The unlock protocol is very simple: clients wishing to release a lock simply delete the node they created in step 1.

Here are a few things to notice:

  •    The removal of a node will only cause one client to wake up since each node is watched by exactly one client. In this way, you avoid the
       herd effect.
  •    There is no polling or timeouts.
  •    Because of the way you implement locking, it is easy to see the amount of lock contention, break locks, debug locking problems, etc.
Recoverable Errors and the GUID
  •    If a recoverable error occurs calling create() the client should call getChildren() and check for a node containing the guid used in the
       path name. This handles the case (noted above) of the create() succeeding on the server but the server crashing before returning the
       name of the new node.
Even the Distribution
         Has Issues
from org.apache.zookeeper.recipes.lock.WriteLock
if (id == null) {
    long sessionId = zookeeper.getSessionId();
    String prefix = "x-" + sessionId + "-";
    // lets try look up the current ID if we failed
    // in the middle of creating the znode
    findPrefixInChildren(prefix, zookeeper, dir);
    idName = new ZNodeName(id);
}
Even the Distribution
           Has Issues
  from org.apache.zookeeper.recipes.lock.WriteLock
  if (id == null) {
      long sessionId = zookeeper.getSessionId();
      String prefix = "x-" + sessionId + "-";
      // lets try look up the current ID if we failed
      // in the middle of creating the znode
      findPrefixInChildren(prefix, zookeeper, dir);
      idName = new ZNodeName(id);
  }


Bad handling of Ephemeral-Sequential issue!
What About ZKClient?
•   Unclear if it’s still being supported
    Eleven open issues (back to 10/1/2009)
•   README:
    “+ TBD”
•   No docs
•   Little or no retries
•   Design problems:
          •   All exceptions converted to RuntimeException
          •   Recipes/management code highly coupled
          •   Lots of foreground synchronization
          •   Small number of tests
          •   ... etc ...
•   ...
Introducing Curator
Introducing Curator
Curator n ˈkyo͝orˌātər: a keeper or custodian of a
museum or other collection - A ZooKeeper
Keeper
Three components:
  Client - A replacement/wrapper for the bundled ZooKeeper class

  Framework - A high-level API that greatly simplifies using
  ZooKeeper

  Recipes - Implementations of some of the common ZooKeeper
  "recipes" built on top of the Curator Framework
Overview of Curator
The Curator Stack
• Client
• Framework
• Recipes
• Extensions
The Curator Stack
• Client
• Framework
• Recipes        Curator Recipes


                Curator Framework



• Extensions      Curator Client

                    ZooKeeper
Curator is a platform
for writing ZooKeeper
        Recipes
Curator Client
 manages the
 ZooKeeper
 Connection
Curator Recipes


                 Curator Framework

                   Curator Client

                     ZooKeeper




Curator Client
 manages the
 ZooKeeper
 Connection
Curator Framework
   uses retry for all
operations and provides
    a friendlier API
Curator Recipes


                   Curator Framework

                     Curator Client

                       ZooKeeper




 Curator Framework
   uses retry for all
operations and provides
    a friendlier API
Curator Recipes:
implementations of all
 recipes listed on the
ZK website (and more)
Curator Recipes


                  Curator Framework

                    Curator Client

                      ZooKeeper




   Curator Recipes:
implementations of all
 recipes listed on the
ZK website (and more)
The Recipes
The Recipes
• Leader Selector
• Distributed Locks
• Queues
• Barriers
• Counters
• Atomics
• ...
CuratorFramework
               Instance
CuratorFrameworkFactory.newClient(...)
              ---------------------
CuratorFrameworkFactory.builder()
   .connectString(“...”)
   ...
   .build()




         Usually injected as a singleton
Must Be Started

client.start();


// client is now ready for use
Leader Selector
By far the most common usage of
ZooKeeper


       Distributed lock with a notification
       mechanism
Sample
public class CleanupLeader implements
         LeaderSelectorListener
  {
      ...
      @Override
      public void takeLeadership(CuratorFramework client)
            throws Exception
      {
         while ( !Thread.currentThread().isInterrupted() )
         {
           sleepUntilNextPeriod();
           doPeriodicCleanup();
         }
      }
  }




...
LeaderSelector leaderSelector =
    new LeaderSelector(client, path, new CleanupLeader());
leaderSelector.start();
Distributed Locks
• InterProcessMutex
• InterProcessReadWriteLock
• InterProcessMultiLock
• InterProcessSemaphore
Distributed Locks
• InterProcessMutex
• InterProcessReadWriteLock
• InterProcessMultiLock
• InterProcessSemaphore

            Very similar to JDK locks
Sample
InterProcessMutex mutex =
    new InterProcessMutex(client, lockPath);

mutex.acquire();
try
{
    // do work in critical section
}
finally
{
    mutex.release();
}
Low-Level Details
public void process(WatchedEvent event)
{
    boolean wasConnected = isConnected.get();
    boolean newIsConnected = wasConnected;
    if ( event.getType() == Watcher.Event.EventType.None )
    {
        newIsConnected = (event.getState() == Event.KeeperState.SyncConnected);
        if ( event.getState() == Event.KeeperState.Expired )
        {
            handleExpiredSession();
        }
    }

    if ( newIsConnected != wasConnected )
    {
        isConnected.set(newIsConnected);
        connectionStartMs = System.currentTimeMillis();
    }

     ...
}
public static boolean      shouldRetry(int rc)
{
    return (rc == KeeperException.Code.CONNECTIONLOSS.intValue()) ||
        (rc == KeeperException.Code.OPERATIONTIMEOUT.intValue()) ||
        (rc == KeeperException.Code.SESSIONMOVED.intValue()) ||
        (rc == KeeperException.Code.SESSIONEXPIRED.intValue());
}




public void         takeException(Exception exception) throws Exception
{
    boolean     rethrow = true;
    if ( isRetryException(exception) )
    {
        if ( retryPolicy.allowRetry(retryCount++, System.currentTimeMillis() - startTimeMs) )
        {
            rethrow = false;
        }
    }

    if ( rethrow )
    {
        throw exception;
    }
}
byte[]      responseData = RetryLoop.callWithRetry
(
    client.getZookeeperClient(),
    new Callable<byte[]>()
    {
        @Override
        public byte[] call() throws Exception
        {
            byte[]      responseData;
            responseData = client.getZooKeeper().getData(path,
               ...);
            }
            return responseData;
        }
    }
);
return responseData;
client.withProtectedEphemeralSequential()
final AtomicBoolean     firstTime = new AtomicBoolean(true);
String                  returnPath = RetryLoop.callWithRetry
(
    client.getZookeeperClient(),
    new Callable<String>()
    {
        @Override
        public String call() throws Exception
        {
           ...

               String createdPath = null;
               if ( !firstTime.get() && doProtectedEphemeralSequential )
               {
                   createdPath = findProtectedNodeInForeground(localPath);
               }
             ...
         }
     }
);
public interface ConnectionStateListener
{
    public void stateChanged(CuratorFramework
        client, ConnectionState newState);
}

public enum ConnectionState
{
    SUSPENDED,
    RECONNECTED,
    LOST
}
if ( e instanceof KeeperException.ConnectionLossException )
  {
      connectionStateManager.addStateChange(ConnectionState.LOST);
  }



private void validateConnection(CuratorEvent curatorEvent)
{
    if ( curatorEvent.getType() == CuratorEventType.WATCHED )
    {
        if ( curatorEvent.getWatchedEvent().getState() ==
          Watcher.Event.KeeperState.Disconnected )
        {
            connectionStateManager.addStateChange(ConnectionState.SUSPENDED);
            internalSync(this, "/", null);
        }
        else if ( curatorEvent.getWatchedEvent().getState() ==
          Watcher.Event.KeeperState.Expired )
        {
            connectionStateManager.addStateChange(ConnectionState.LOST);
        }
        else if ( curatorEvent.getWatchedEvent().getState() ==
          Watcher.Event.KeeperState.SyncConnected )
        {
            connectionStateManager.addStateChange(ConnectionState.RECONNECTED);
        }
    }
}
Testing Utilities
• TestingServer: manages an internally
  running ZooKeeper server
  // Create the server using a random port
  public TestingServer()




• TestingCluster: manages an internally
  running ensemble of ZooKeeper servers.
  // Creates an ensemble comprised of n servers.
  // Each server will use a temp directory and
  // random ports
  public TestingCluster(int instanceQty)
Extensions
• Discovery
• Discovery REST Server
• Exhibitor
• ???
Extensions
• Discovery
• Discovery REST Server
• Exhibitor                Curator Recipes




• ???
                          Curator Framework
                                              Extensions
                            Curator Client

                              ZooKeeper
Exhibitor
  Sneak Peak
Exhibitor
  Sneak Peak
Exhibitor
  Sneak Peak
Exhibitor
  Sneak Peak
Exhibitor
       Sneak Peak




 March or April 2012
Open Source on Github
Netflix Github
Netflix Github




Netflix’s home for Open Source
Maven Central
Maven Central
Binaries pushed to Maven Central

  <dependency>
      <groupId>com.netflix.curator</groupId>
      <artifactId>curator-recipes</artifactId>
      <version>1.1.0</version>
  </dependency>
Much%younger%–%much%thinner0


        Jordan Zimmerman
 jzimmerman@netflix.com
                @randgalt
Q&A


 Much%younger%–%much%thinner0


         Jordan Zimmerman
  jzimmerman@netflix.com
                 @randgalt

Curator intro

  • 1.
    Curator The Netflix ZooKeeperClient Library Jordan Zimmerman Senior Platform Engineer Netflix, Inc. jzimmerman@netflix.com @rangalt
  • 2.
  • 3.
    Agenda • Background • Overviewof Curator • The Recipes • Some Low-Level Details • Q&A
  • 4.
  • 5.
    What’s wrong withthis code? ZooKeeper client = new ZooKeeper(...); client.create(“/foo”, data, ...);
  • 6.
  • 7.
    ZooKeeper Surprise • Almostno ZK client call is safe • You cannot assume success • You must handle exceptions
  • 8.
    The Recipes AreHard Locks Fully distributed locks that are globally synchronous, meaning at any snapshot in time no two clients think they hold the same lock. These can be implemented using ZooKeeeper. As with priority queues, first define a lock node. Note There now exists a Lock implementation in ZooKeeper recipes directory. This is distributed with the release -- src/recipes/lock directory of the release artifact. Clients wishing to obtain a lock do the following: 1. Call create( ) with a pathname of "_locknode_/guid-lock-" and the sequence and ephemeral flags set. The guid is needed in case the create() result is missed. See the note below. 2. Call getChildren( ) on the lock node without setting the watch flag (this is important to avoid the herd effect). 3. If the pathname created in step 1 has the lowest sequence number suffix, the client has the lock and the client exits the protocol. 4. The client calls exists( ) with the watch flag set on the path in the lock directory with the next lowest sequence number. 5. if exists( ) returns false, go to step 2. Otherwise, wait for a notification for the pathname from the previous step before going to step 2. The unlock protocol is very simple: clients wishing to release a lock simply delete the node they created in step 1. Here are a few things to notice: • The removal of a node will only cause one client to wake up since each node is watched by exactly one client. In this way, you avoid the herd effect. • There is no polling or timeouts. • Because of the way you implement locking, it is easy to see the amount of lock contention, break locks, debug locking problems, etc. Recoverable Errors and the GUID • If a recoverable error occurs calling create() the client should call getChildren() and check for a node containing the guid used in the path name. This handles the case (noted above) of the create() succeeding on the server but the server crashing before returning the name of the new node.
  • 9.
    Even the Distribution Has Issues from org.apache.zookeeper.recipes.lock.WriteLock if (id == null) { long sessionId = zookeeper.getSessionId(); String prefix = "x-" + sessionId + "-"; // lets try look up the current ID if we failed // in the middle of creating the znode findPrefixInChildren(prefix, zookeeper, dir); idName = new ZNodeName(id); }
  • 10.
    Even the Distribution Has Issues from org.apache.zookeeper.recipes.lock.WriteLock if (id == null) { long sessionId = zookeeper.getSessionId(); String prefix = "x-" + sessionId + "-"; // lets try look up the current ID if we failed // in the middle of creating the znode findPrefixInChildren(prefix, zookeeper, dir); idName = new ZNodeName(id); } Bad handling of Ephemeral-Sequential issue!
  • 11.
    What About ZKClient? • Unclear if it’s still being supported Eleven open issues (back to 10/1/2009) • README: “+ TBD” • No docs • Little or no retries • Design problems: • All exceptions converted to RuntimeException • Recipes/management code highly coupled • Lots of foreground synchronization • Small number of tests • ... etc ... • ...
  • 14.
  • 15.
    Introducing Curator Curator nˈkyo͝orˌātər: a keeper or custodian of a museum or other collection - A ZooKeeper Keeper Three components: Client - A replacement/wrapper for the bundled ZooKeeper class Framework - A high-level API that greatly simplifies using ZooKeeper Recipes - Implementations of some of the common ZooKeeper "recipes" built on top of the Curator Framework
  • 16.
  • 17.
    The Curator Stack •Client • Framework • Recipes • Extensions
  • 18.
    The Curator Stack •Client • Framework • Recipes Curator Recipes Curator Framework • Extensions Curator Client ZooKeeper
  • 20.
    Curator is aplatform for writing ZooKeeper Recipes
  • 21.
    Curator Client managesthe ZooKeeper Connection
  • 22.
    Curator Recipes Curator Framework Curator Client ZooKeeper Curator Client manages the ZooKeeper Connection
  • 23.
    Curator Framework uses retry for all operations and provides a friendlier API
  • 24.
    Curator Recipes Curator Framework Curator Client ZooKeeper Curator Framework uses retry for all operations and provides a friendlier API
  • 25.
    Curator Recipes: implementations ofall recipes listed on the ZK website (and more)
  • 26.
    Curator Recipes Curator Framework Curator Client ZooKeeper Curator Recipes: implementations of all recipes listed on the ZK website (and more)
  • 27.
  • 28.
    The Recipes • LeaderSelector • Distributed Locks • Queues • Barriers • Counters • Atomics • ...
  • 29.
    CuratorFramework Instance CuratorFrameworkFactory.newClient(...) --------------------- CuratorFrameworkFactory.builder() .connectString(“...”) ... .build() Usually injected as a singleton
  • 30.
    Must Be Started client.start(); //client is now ready for use
  • 31.
    Leader Selector By farthe most common usage of ZooKeeper Distributed lock with a notification mechanism
  • 32.
  • 33.
    public class CleanupLeaderimplements LeaderSelectorListener { ... @Override public void takeLeadership(CuratorFramework client) throws Exception { while ( !Thread.currentThread().isInterrupted() ) { sleepUntilNextPeriod(); doPeriodicCleanup(); } } } ... LeaderSelector leaderSelector = new LeaderSelector(client, path, new CleanupLeader()); leaderSelector.start();
  • 34.
    Distributed Locks • InterProcessMutex •InterProcessReadWriteLock • InterProcessMultiLock • InterProcessSemaphore
  • 35.
    Distributed Locks • InterProcessMutex •InterProcessReadWriteLock • InterProcessMultiLock • InterProcessSemaphore Very similar to JDK locks
  • 36.
  • 37.
    InterProcessMutex mutex = new InterProcessMutex(client, lockPath); mutex.acquire(); try { // do work in critical section } finally { mutex.release(); }
  • 38.
  • 39.
    public void process(WatchedEventevent) { boolean wasConnected = isConnected.get(); boolean newIsConnected = wasConnected; if ( event.getType() == Watcher.Event.EventType.None ) { newIsConnected = (event.getState() == Event.KeeperState.SyncConnected); if ( event.getState() == Event.KeeperState.Expired ) { handleExpiredSession(); } } if ( newIsConnected != wasConnected ) { isConnected.set(newIsConnected); connectionStartMs = System.currentTimeMillis(); } ... }
  • 40.
    public static boolean shouldRetry(int rc) { return (rc == KeeperException.Code.CONNECTIONLOSS.intValue()) || (rc == KeeperException.Code.OPERATIONTIMEOUT.intValue()) || (rc == KeeperException.Code.SESSIONMOVED.intValue()) || (rc == KeeperException.Code.SESSIONEXPIRED.intValue()); } public void takeException(Exception exception) throws Exception { boolean rethrow = true; if ( isRetryException(exception) ) { if ( retryPolicy.allowRetry(retryCount++, System.currentTimeMillis() - startTimeMs) ) { rethrow = false; } } if ( rethrow ) { throw exception; } }
  • 41.
    byte[] responseData = RetryLoop.callWithRetry ( client.getZookeeperClient(), new Callable<byte[]>() { @Override public byte[] call() throws Exception { byte[] responseData; responseData = client.getZooKeeper().getData(path, ...); } return responseData; } } ); return responseData;
  • 42.
  • 43.
    final AtomicBoolean firstTime = new AtomicBoolean(true); String returnPath = RetryLoop.callWithRetry ( client.getZookeeperClient(), new Callable<String>() { @Override public String call() throws Exception { ... String createdPath = null; if ( !firstTime.get() && doProtectedEphemeralSequential ) { createdPath = findProtectedNodeInForeground(localPath); } ... } } );
  • 44.
    public interface ConnectionStateListener { public void stateChanged(CuratorFramework client, ConnectionState newState); } public enum ConnectionState { SUSPENDED, RECONNECTED, LOST }
  • 45.
    if ( einstanceof KeeperException.ConnectionLossException ) { connectionStateManager.addStateChange(ConnectionState.LOST); } private void validateConnection(CuratorEvent curatorEvent) { if ( curatorEvent.getType() == CuratorEventType.WATCHED ) { if ( curatorEvent.getWatchedEvent().getState() == Watcher.Event.KeeperState.Disconnected ) { connectionStateManager.addStateChange(ConnectionState.SUSPENDED); internalSync(this, "/", null); } else if ( curatorEvent.getWatchedEvent().getState() == Watcher.Event.KeeperState.Expired ) { connectionStateManager.addStateChange(ConnectionState.LOST); } else if ( curatorEvent.getWatchedEvent().getState() == Watcher.Event.KeeperState.SyncConnected ) { connectionStateManager.addStateChange(ConnectionState.RECONNECTED); } } }
  • 46.
  • 47.
    • TestingServer: managesan internally running ZooKeeper server // Create the server using a random port public TestingServer() • TestingCluster: manages an internally running ensemble of ZooKeeper servers. // Creates an ensemble comprised of n servers. // Each server will use a temp directory and // random ports public TestingCluster(int instanceQty)
  • 48.
    Extensions • Discovery • DiscoveryREST Server • Exhibitor • ???
  • 49.
    Extensions • Discovery • DiscoveryREST Server • Exhibitor Curator Recipes • ??? Curator Framework Extensions Curator Client ZooKeeper
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
    Exhibitor Sneak Peak March or April 2012 Open Source on Github
  • 57.
  • 58.
  • 59.
  • 60.
    Maven Central Binaries pushedto Maven Central <dependency> <groupId>com.netflix.curator</groupId> <artifactId>curator-recipes</artifactId> <version>1.1.0</version> </dependency>
  • 61.
    Much%younger%–%much%thinner0 Jordan Zimmerman jzimmerman@netflix.com @randgalt
  • 62.
    Q&A Much%younger%–%much%thinner0 Jordan Zimmerman jzimmerman@netflix.com @randgalt

Editor's Notes

  • #2 \n
  • #3 * Background - ZK issues, the need for a wrapper, etc. - mention that you can go more in depth on this\n* Why Curator was written, etc.\n* Low-level - details of the client/framework. Error handling, assumptions, etc.\n* Mention that this will be very technical - lots of code\n
  • #4 * Background - ZK issues, the need for a wrapper, etc. - mention that you can go more in depth on this\n* Why Curator was written, etc.\n* Low-level - details of the client/framework. Error handling, assumptions, etc.\n* Mention that this will be very technical - lots of code\n
  • #5 * Background - ZK issues, the need for a wrapper, etc. - mention that you can go more in depth on this\n* Why Curator was written, etc.\n* Low-level - details of the client/framework. Error handling, assumptions, etc.\n* Mention that this will be very technical - lots of code\n
  • #6 * Background - ZK issues, the need for a wrapper, etc. - mention that you can go more in depth on this\n* Why Curator was written, etc.\n* Low-level - details of the client/framework. Error handling, assumptions, etc.\n* Mention that this will be very technical - lots of code\n
  • #7 * Background - ZK issues, the need for a wrapper, etc. - mention that you can go more in depth on this\n* Why Curator was written, etc.\n* Low-level - details of the client/framework. Error handling, assumptions, etc.\n* Mention that this will be very technical - lots of code\n
  • #8 \n
  • #9 \n
  • #10 \n
  • #11 \n
  • #12 \n
  • #13 \n
  • #14 Mention that you contributed part on recoverable errors\n
  • #15 \n
  • #16 \n
  • #17 \n
  • #18 \n
  • #19 \n
  • #20 \n
  • #21 \n
  • #22 \n
  • #23 \n
  • #24 \n
  • #25 \n
  • #26 \n
  • #27 \n
  • #28 \n
  • #29 \n
  • #30 \n
  • #31 \n
  • #32 \n
  • #33 \n
  • #34 \n
  • #35 \n
  • #36 \n
  • #37 Becomes a persistent, unchanging handle to the ZK ensemble\n
  • #38 \n
  • #39 \n
  • #40 \n
  • #41 \n
  • #42 \n
  • #43 \n
  • #44 \n
  • #45 \n
  • #46 \n
  • #47 \n
  • #48 \n
  • #49 \n
  • #50 \n
  • #51 \n
  • #52 \n
  • #53 \n
  • #54 \n
  • #55 \n
  • #56 \n
  • #57 \n
  • #58 Kishore Gopalakrishna from Linked-in\n
  • #59 \n
  • #60 \n
  • #61 \n
  • #62 \n
  • #63 \n
  • #64 \n
  • #65 \n
  • #66 \n