Exhibitor
Netflix’s ZooKeeper Management System




                           Jordan Zimmerman
                                   Senior Platform Engineer
                                                 Netflix, Inc.
                                 jzimmerman@netflix.com
                                                  @rangalt
The Problem
• ZooKeeper is statically configured
• Limited tools for managing the ensemble
• Backup/restore is sometimes needed
• Visualization is desperately needed
• Prior to 3.4.x, periodic cleanup needed
The Goal
Chaos Monkey-able
• See http://techblog.netflix.com/2011/07/
   netflix-simian-army.html
 A tool that randomly disables our production instances to make sure we can
 survive common types of failure without any customer impact.


• Completely unmanned
• Bringing up a new ensemble should be
   turn-key/push-button
Features
Instance Monitoring
Each Exhibitor instance monitors the
ZooKeeper server running on the same
server. If ZooKeeper is not running, Exhibitor
will write the zoo.cfg file, etc. and start it. If
ZooKeeper crashes for some reason,
Exhibitor will restart it.
Log Cleanup
In versions prior to ZooKeeper 3.4.x, log file
maintenance is necessary. Exhibitor will
periodically do this maintenance.
Backup/Restore
Backups in a ZooKeeper ensemble are more complicated
than for a traditional data store (e.g. aRDBMS). Generally,
most of the data in ZooKeeper is ephemeral. It would be
harmful to blindly restore an entire ZooKeeper data set.
What is needed is selective restoration to prevent accidental
damage to a subset of the data set. Exhibitor enables this.
Exhibitor will periodically backup the ZooKeeper transaction
files. Once backed up, you can index any of these transaction
files. Once indexed, you can search for individual transactions
and “replay” them to restore a given ZNode to ZooKeeper.
Cluster-wide
        Configuration
Exhibitor presents a single console for your
entire ZooKeeper ensemble. Configuration
changes made in Exhibitor will be applied to
the entire ensemble.
Rolling Ensemble
          Changes
Exhibitor can update the servers in the
ensemble in a rolling fashion so that the
ZooKeeper ensemble can stay up and in
quorum while the changes are being made.
Visualizer
Exhibitor provides a graphical tree view of
the ZooKeeper ZNode hierarchy.
ZooKeeper Data
       Mutation
When enabled, Exhibitor can create/update/
delete nodes in the ZooKeeper hierarchy.
Curator Integration
Exhibitor and Curator (Cur/Ex!) can be
configured to work together so that Curator
instances are updated for changes in the
ensemble.
           Exhibitor             Exhibitor             Exhibitor
              A                     B                     ...




           Round Robin - periodic query for servers list




                       Curator Clients
                          Curator Clients
                              Curator Clients
How it Works
Shared Configuration
                             • S3
                             • File System
                 Shared      • Etc.
                 Config




     Exhibitor   Exhibitor         Exhibitor
        A           B                 ...
Coming Soon...
• Auto-register new instances
• Auto-remove old instances
• Alerting
• ???
Using / Integration
• Stand alone application
                   - or -


• Library/JAR
REST API
• https://github.com/Netflix/exhibitor/wiki/
  REST-Introduction
Demos

Q&A

Exhibitor Introduction

  • 1.
    Exhibitor Netflix’s ZooKeeper ManagementSystem Jordan Zimmerman Senior Platform Engineer Netflix, Inc. jzimmerman@netflix.com @rangalt
  • 2.
  • 3.
    • ZooKeeper isstatically configured • Limited tools for managing the ensemble • Backup/restore is sometimes needed • Visualization is desperately needed • Prior to 3.4.x, periodic cleanup needed
  • 4.
  • 5.
    Chaos Monkey-able • Seehttp://techblog.netflix.com/2011/07/ netflix-simian-army.html A tool that randomly disables our production instances to make sure we can survive common types of failure without any customer impact. • Completely unmanned • Bringing up a new ensemble should be turn-key/push-button
  • 6.
  • 7.
    Instance Monitoring Each Exhibitorinstance monitors the ZooKeeper server running on the same server. If ZooKeeper is not running, Exhibitor will write the zoo.cfg file, etc. and start it. If ZooKeeper crashes for some reason, Exhibitor will restart it.
  • 9.
    Log Cleanup In versionsprior to ZooKeeper 3.4.x, log file maintenance is necessary. Exhibitor will periodically do this maintenance.
  • 10.
    Backup/Restore Backups in aZooKeeper ensemble are more complicated than for a traditional data store (e.g. aRDBMS). Generally, most of the data in ZooKeeper is ephemeral. It would be harmful to blindly restore an entire ZooKeeper data set. What is needed is selective restoration to prevent accidental damage to a subset of the data set. Exhibitor enables this. Exhibitor will periodically backup the ZooKeeper transaction files. Once backed up, you can index any of these transaction files. Once indexed, you can search for individual transactions and “replay” them to restore a given ZNode to ZooKeeper.
  • 11.
    Cluster-wide Configuration Exhibitor presents a single console for your entire ZooKeeper ensemble. Configuration changes made in Exhibitor will be applied to the entire ensemble.
  • 13.
    Rolling Ensemble Changes Exhibitor can update the servers in the ensemble in a rolling fashion so that the ZooKeeper ensemble can stay up and in quorum while the changes are being made.
  • 14.
    Visualizer Exhibitor provides agraphical tree view of the ZooKeeper ZNode hierarchy.
  • 16.
    ZooKeeper Data Mutation When enabled, Exhibitor can create/update/ delete nodes in the ZooKeeper hierarchy.
  • 18.
    Curator Integration Exhibitor andCurator (Cur/Ex!) can be configured to work together so that Curator instances are updated for changes in the ensemble. Exhibitor Exhibitor Exhibitor A B ... Round Robin - periodic query for servers list Curator Clients Curator Clients Curator Clients
  • 19.
  • 20.
    Shared Configuration • S3 • File System Shared • Etc. Config Exhibitor Exhibitor Exhibitor A B ...
  • 21.
  • 22.
    • Auto-register newinstances • Auto-remove old instances • Alerting • ???
  • 23.
  • 24.
    • Stand aloneapplication - or - • Library/JAR
  • 25.
  • 26.
  • 27.