SlideShare a Scribd company logo
Building Distributed
Applications with Apache
Zookeeper
Alex Ehrnschwender | Game Server Engineer at DeNA
What is Zookeeper?
“ZooKeeper is a centralized service for
maintaining configuration information,
naming, providing distributed
synchronization, and providing group
services.”
Zookeeper Wiki
ZooKeeper: A Coordination Service for Distributed Applications
Coordination & synchronization for
distributed processes
Logical namespacing implemented by a
hierarchy (tree) of znodes
Replicated in-memory over multiple hosts
for reliability, availability, and performance
Simple API of CRUD & basic tree operations
for client integration
Zookeeper: Reliability & Consistency
Distributed ensemble with automatic leader
election through quorum
Replicated in-memory on every instance with
snapshot writes to disk
Client TCP connection maintained to any
node with failover support
Guaranteed atomicity & sequential
consistency
Zookeeper: Watches & Ephemeral nodes
Underlying znodes have a data structure consisting of version numbers (cversion, aversion) &
timestamps
Watches
● Client-initiated subscriptions to znodes
● Changes to a watched znode trigger notification to subscribed clients
Ephemeral Nodes
● Backed by a client session and deleted when client session ends
● Cannot have children
Zookeeper: But… why?
“Because of the difficulty of implementing
these kinds of services, applications initially
usually skimp on them, which make them
brittle in the presence of change and
difficult to manage. Even when done
correctly, different implementations of
these services lead to management
complexity when the applications are
deployed.”
Zookeeper Wiki
Zookeeper: Advantages for Backing a Server Cluster
Server workers can become cluster-aware
So much out-of-the-box that would be duplicated with a custom solution
Extremely fast reads (10:1 performance against writes)
Small footprint - An ensemble of only 5-7 zk instances can serve the
coordination needs of several large production applications
Centralized event broadcasting & failure detection (heartbeat)
Zookeeper: Common Use Cases
● Configuration Management
● Service Discovery
● Distributed Cloud-Based File Systems
● Internal DNS Management
● Master (Leader) Election and Voting
● Messaging Queue
● Event Broadcasting & Notification
Use Case Example #1 - Managing Redis Shards
ZK Use Case Example #1 - Pinterest
Pinterest stores their entire follower model inside sharded Redis instances (
~9000 Redis shards, multiple instances per core)
Shard configuration is stored and managed by Zookeeper
Client lookups and watches for shard location & subsequent data retrieval
Master-slave failover triggers updates to znode representation (slave address replaces master)
Vertical splitting of data broadcasted to watching clients
Use Case Example #2 - HBase Cluster Configuration
Code Examples
public void join(String groupName, String memberName)
throws KeeperException, InterruptedException {
String path = "/" + groupName + "/" + memberName;
String createdPath = zk.create(path,
null /* data */,
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL);
System.out.println("Created " + createdPath);
}
public void create(String groupName)
throws KeeperException, InterruptedException {
String path = "/" + groupName;
String createdPath = zk.create(path,
null /* data */,
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
System.out.println("Created " + createdPath);
}
Code Examples (cont.)
public void delete(String groupName)
throws KeeperException, InterruptedException {
String path = "/" + groupName;
try {
List<String> children = zk.getChildren(path, false);
for(String child : children) {
zk.delete(path + "/" + child, -1); /* child */
}
zk.delete(path, -1); /* parent */
} catch (KeeperException.NoNodeException e) {
System.out.printf("Group %s does not existn", groupName);
}
}
public void list(String groupName)
throws KeeperException, InterruptedException {
String path = "/" + groupName;
try {
List<String> children = zk.getChildren(path, false);
for(String child : children) {
System.out.println(child);
}
} catch (KeeperException.NoNodeException e) {
System.out.printf("Group %s does not existn",
groupName);
}
}
Performance
Standalone ops/sec 3-Node Ensemble (ops/sec)
Reference:
https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
Sample Configuration (zoo.cfg)
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
Exhibitor: A ZK Monitoring & Administration Tool from Netflix
Centralization & externalization of zk ensemble configuration* (S3/remote FS)
Web UI & REST API for ease of management
Instance monitoring with automatic configuration updates
Rolling ensemble changes while maintaining quorum
Miscellaneous administration tasks (backup/restore, log & snapshot cleanup)
* Configuration management for a configuration manager.... so meta!
Questions?
Appendix
Zookeeper Atomic Broadcast (ZAB) Algorithm
● Protocol for managing atomic updates to replicas
● Responsible for:
o Agreeing on an ensemble leader
o Synchronizing replicas
o Managing transactions and broadcasts
o Recovery of state
● ZXIDs & transactional ordering
● Guarantees:
o Local & global primary order
o Primary integrity
Performance
Performance
Standalone ops/sec 3-Node Ensemble (ops/sec)
Reference:
https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
Sample Configuration (zoo.cfg)
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
References
● http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch
● http://zookeeper.apache.org/doc/trunk/zookeeperOver.html
● http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html
● https://github.com/Netflix/exhibitor/wiki
● http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf
● http://web.stanford.edu/class/cs347/reading/zab.pdf
● http://highscalability.com/blog/2008/7/15/zookeeper-a-reliable-scalable-distributed-coordination-
syste.html
● https://wiki.apache.org/solr/SolrCloud
● http://www.slideshare.net/scottleber/apache-zookeeper

More Related Content

What's hot

Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
Nguyen Quang
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
juvenxu
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayAndrei Savu
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Curator intro
Curator introCurator intro
Curator intro
Jordan Zimmerman
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
A Python Petting Zoo
A Python Petting ZooA Python Petting Zoo
A Python Petting Zoo
devondjones
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
Shalin Shekhar Mangar
 
zookeeperProgrammers
zookeeperProgrammerszookeeperProgrammers
zookeeperProgrammersHiroshi Ono
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
thelabdude
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
OSCON Byrum
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
Jon Haddad
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
Rohit Jnagal
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
Shiao-An Yuan
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
lucenerevolution
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
DataStax Academy
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 

What's hot (20)

Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Zookeeper In Action
Zookeeper In ActionZookeeper In Action
Zookeeper In Action
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Curator intro
Curator introCurator intro
Curator intro
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
A Python Petting Zoo
A Python Petting ZooA Python Petting Zoo
A Python Petting Zoo
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
zookeeperProgrammers
zookeeperProgrammerszookeeperProgrammers
zookeeperProgrammers
 
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale ToolkitDeploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
Deploying and managing SolrCloud in the cloud using the Solr Scale Toolkit
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Native container monitoring
Native container monitoringNative container monitoring
Native container monitoring
 
Monitoring with Prometheus
Monitoring with PrometheusMonitoring with Prometheus
Monitoring with Prometheus
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 

Similar to Distributed Applications with Apache Zookeeper

Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
Scott Leberknight
 
Continuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed GapsContinuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed GapsGilHecht
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operationsgrim_radical
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015
Hao Chen
 
MYSQL
MYSQLMYSQL
MYSQL
gilashikwa
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
Cloud Meetup - Automation in the Cloud
Cloud Meetup - Automation in the CloudCloud Meetup - Automation in the Cloud
Cloud Meetup - Automation in the Cloud
petriojala123
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
Ohad Kravchick
 
Php Site Optimization
Php Site OptimizationPhp Site Optimization
Php Site OptimizationAmit Kejriwal
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
dmoebius
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
Jürgen Ambrosi
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
Richard Claassens CIPPE
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
QAware GmbH
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle Coherence
Oracle
 
Microservices observability
Microservices observabilityMicroservices observability
Microservices observability
Maxim Shelest
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
Jorge Lopez-Malla
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
Mathias Herberts
 

Similar to Distributed Applications with Apache Zookeeper (20)

Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
 
Continuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed GapsContinuity Software 4.3 Detailed Gaps
Continuity Software 4.3 Detailed Gaps
 
PuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into OperationsPuppetDB: Sneaking Clojure into Operations
PuppetDB: Sneaking Clojure into Operations
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015
 
MYSQL
MYSQLMYSQL
MYSQL
 
Keeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETLKeeping Spark on Track: Productionizing Spark for ETL
Keeping Spark on Track: Productionizing Spark for ETL
 
Cloud Meetup - Automation in the Cloud
Cloud Meetup - Automation in the CloudCloud Meetup - Automation in the Cloud
Cloud Meetup - Automation in the Cloud
 
Building and Scaling Node.js Applications
Building and Scaling Node.js ApplicationsBuilding and Scaling Node.js Applications
Building and Scaling Node.js Applications
 
Php Site Optimization
Php Site OptimizationPhp Site Optimization
Php Site Optimization
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Copper: A high performance workflow engine
Copper: A high performance workflow engineCopper: A high performance workflow engine
Copper: A high performance workflow engine
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
An Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle CoherenceAn Engineer's Intro to Oracle Coherence
An Engineer's Intro to Oracle Coherence
 
Microservices observability
Microservices observabilityMicroservices observability
Microservices observability
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 

Recently uploaded

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 

Recently uploaded (20)

Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 

Distributed Applications with Apache Zookeeper

  • 1. Building Distributed Applications with Apache Zookeeper Alex Ehrnschwender | Game Server Engineer at DeNA
  • 2. What is Zookeeper? “ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.” Zookeeper Wiki
  • 3. ZooKeeper: A Coordination Service for Distributed Applications Coordination & synchronization for distributed processes Logical namespacing implemented by a hierarchy (tree) of znodes Replicated in-memory over multiple hosts for reliability, availability, and performance Simple API of CRUD & basic tree operations for client integration
  • 4. Zookeeper: Reliability & Consistency Distributed ensemble with automatic leader election through quorum Replicated in-memory on every instance with snapshot writes to disk Client TCP connection maintained to any node with failover support Guaranteed atomicity & sequential consistency
  • 5. Zookeeper: Watches & Ephemeral nodes Underlying znodes have a data structure consisting of version numbers (cversion, aversion) & timestamps Watches ● Client-initiated subscriptions to znodes ● Changes to a watched znode trigger notification to subscribed clients Ephemeral Nodes ● Backed by a client session and deleted when client session ends ● Cannot have children
  • 6. Zookeeper: But… why? “Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them, which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.” Zookeeper Wiki
  • 7. Zookeeper: Advantages for Backing a Server Cluster Server workers can become cluster-aware So much out-of-the-box that would be duplicated with a custom solution Extremely fast reads (10:1 performance against writes) Small footprint - An ensemble of only 5-7 zk instances can serve the coordination needs of several large production applications Centralized event broadcasting & failure detection (heartbeat)
  • 8. Zookeeper: Common Use Cases ● Configuration Management ● Service Discovery ● Distributed Cloud-Based File Systems ● Internal DNS Management ● Master (Leader) Election and Voting ● Messaging Queue ● Event Broadcasting & Notification
  • 9. Use Case Example #1 - Managing Redis Shards
  • 10. ZK Use Case Example #1 - Pinterest Pinterest stores their entire follower model inside sharded Redis instances ( ~9000 Redis shards, multiple instances per core) Shard configuration is stored and managed by Zookeeper Client lookups and watches for shard location & subsequent data retrieval Master-slave failover triggers updates to znode representation (slave address replaces master) Vertical splitting of data broadcasted to watching clients
  • 11. Use Case Example #2 - HBase Cluster Configuration
  • 12. Code Examples public void join(String groupName, String memberName) throws KeeperException, InterruptedException { String path = "/" + groupName + "/" + memberName; String createdPath = zk.create(path, null /* data */, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL); System.out.println("Created " + createdPath); } public void create(String groupName) throws KeeperException, InterruptedException { String path = "/" + groupName; String createdPath = zk.create(path, null /* data */, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); System.out.println("Created " + createdPath); }
  • 13. Code Examples (cont.) public void delete(String groupName) throws KeeperException, InterruptedException { String path = "/" + groupName; try { List<String> children = zk.getChildren(path, false); for(String child : children) { zk.delete(path + "/" + child, -1); /* child */ } zk.delete(path, -1); /* parent */ } catch (KeeperException.NoNodeException e) { System.out.printf("Group %s does not existn", groupName); } } public void list(String groupName) throws KeeperException, InterruptedException { String path = "/" + groupName; try { List<String> children = zk.getChildren(path, false); for(String child : children) { System.out.println(child); } } catch (KeeperException.NoNodeException e) { System.out.printf("Group %s does not existn", groupName); } }
  • 14. Performance Standalone ops/sec 3-Node Ensemble (ops/sec) Reference: https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
  • 16. Exhibitor: A ZK Monitoring & Administration Tool from Netflix Centralization & externalization of zk ensemble configuration* (S3/remote FS) Web UI & REST API for ease of management Instance monitoring with automatic configuration updates Rolling ensemble changes while maintaining quorum Miscellaneous administration tasks (backup/restore, log & snapshot cleanup) * Configuration management for a configuration manager.... so meta!
  • 19. Zookeeper Atomic Broadcast (ZAB) Algorithm ● Protocol for managing atomic updates to replicas ● Responsible for: o Agreeing on an ensemble leader o Synchronizing replicas o Managing transactions and broadcasts o Recovery of state ● ZXIDs & transactional ordering ● Guarantees: o Local & global primary order o Primary integrity
  • 21. Performance Standalone ops/sec 3-Node Ensemble (ops/sec) Reference: https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
  • 23. References ● http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch ● http://zookeeper.apache.org/doc/trunk/zookeeperOver.html ● http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html ● https://github.com/Netflix/exhibitor/wiki ● http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf ● http://web.stanford.edu/class/cs347/reading/zab.pdf ● http://highscalability.com/blog/2008/7/15/zookeeper-a-reliable-scalable-distributed-coordination- syste.html ● https://wiki.apache.org/solr/SolrCloud ● http://www.slideshare.net/scottleber/apache-zookeeper

Editor's Notes

  1. Wears a lot of hats. Can serve multiple purposes, all related to coordination of a distributed system. More concisely, Zookeeper is a coordination service for distributed applications. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. Wikipedia
  2. Digging a bit deeper: Simply put, it backs the nodes of your distributed system through a tree structure of znodes (more in upcoming slides) Central nervous system for your distributed application Centralized and replicated across an odd number of hosts to make up an ensemble Data is kept in memory and is backed up to a log for reliability. By using memory ZooKeeper is very fast and can handle the high loads typically seen in chatty coordination protocols across huge numbers of processes. Prefers read-heavy based applications Clients connect to any zk node in the ensemble and maintain that connection API of create, read, update, delete (but also watches, more in later slides) Replicated hosts make up an “ensemble”
  3. Writes are bubbled up to one elected leader (which is why there should be an odd # of instances). A quorum must confirm the update. Updates are submitted concurrently by clients and committed by FIFO order. After update, incremental state changes are broadcast to replicas using Zookeeper Atomic Broadcast (ZAB). Each state change is incremental with respect to the previous state, so there is an implicit dependence on the order of the state changes. This guarantees: Sequential Consistency - Updates from a client will be applied in the order that they were sent. Atomicity - Updates either succeed or fail. No partial results.
  4. Znodes maintain a stat structure that includes version numbers for data changes, acl changes. The stat structure also has timestamps. The version number, together with the timestamp allow ZooKeeper to validate the cache and to coordinate updates. Each time a znode's data changes, the version number increases. Watches Clients can set watches on znodes. Changes to that znode trigger the watch and then clear the watch. When a watch triggers, ZooKeeper sends the client a notification. Ephemeral Nodes ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Because of this behavior ephemeral znodes are not allowed to have children. Time in ZooKeeper ZooKeeper tracks time multiple ways: Zxid Every change to the ZooKeeper state receives a stamp in the form of a zxid (ZooKeeper Transaction Id). This exposes the total ordering of all changes to ZooKeeper. Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2. Version numbers Every change to a a node will cause an increase to one of the version numbers of that node. The three version numbers are version (number of changes to the data of a znode), cversion (number of changes to the children of a znode), and aversion (number of changes to the ACL of a znode). Ticks When using multi-server ZooKeeper, servers use ticks to define timing of events such as status uploads, session timeouts, connection timeouts between peers, etc. The tick time is only indirectly exposed through the minimum session timeout (2 times the tick time); if a client requests a session timeout less than the minimum session timeout, the server will tell the client that the session timeout is actually the minimum session timeout. Real time ZooKeeper doesn't use real time, or clock time, at all except to put timestamps into the stat structure on znode creation and znode modification. A critical component of ZooKeeper is Zab, the ZooKeeper Atomic Broadcast algorithm, which is the protocol that manages atomic updates to the replicas. It is responsible for agreeing on a leader in the ensemble, synchronizing the replicas, managing update transactions to be broadcast, as well as recovering from a crashed state to a valid state.
  5. Why not simply use a database? Because of the guarantees
  6. Fast especially in read-heavy workload (10:1 performance against writes) More performance data in backup slides
  7. Service Discovery - watch thru heartbeat Let ELECTION be a path of choice of the application. To volunteer to be a leader: Create znode z with path "ELECTION/guid-n_" with both SEQUENCE and EPHEMERAL flags; Let C be the children of "ELECTION", and i be the sequence number of z; Watch for changes on "ELECTION/guid-n_j", where j is the largest sequence number such that j < i and n_j is a znode in C; Upon receiving a notification of znode deletion: Let C be the new set of children of ELECTION; If z is the smallest node in C, then execute leader procedure; Otherwise, watch for changes on "ELECTION/guid-n_j", where j is the largest sequence number such that j < i and n_j is a znode in C;
  8. http://engineering.pinterest.com/post/55272557617/building-a-follower-model-from-scratch When a Redis machine reaches either the memory or CPU thresholds, we split it either horizontally orvertically. Vertical sharding a Redis machine is simply cutting the number of running Redis instances on the machine by half. We bring up a new master as a slave of the existing master and once the slaving is complete, we make it the new master for half of the Redis instances leaving the old master as the master for the other half. The entire user id space is split into 8192 virtual shards. We place one virtual shard per Redis DB, and run multiple Redis instances (ranging from 8 to 32) on each machine depending on the memory and CPU consumption of the shards on those instances. Similarly, we run multiple Redis DBs per Redis instance.
  9. Once a part of Hadoop Currently maintained by Yahoo! and the Apache Software Foundation HBase is the Hadoop database, a distributed, scalable, big data store. HBase supports hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Currently, hbase clients find the cluster to connect to by asking zookeeper. The only configuration a client needs is the zk quorum to connect to. Masters and hbase slave nodes (regionservers) all register themselves with zk. If their znode evaporates, the master or regionserver is consided lost and repair begins. HBase currently will default to manage the zookeeper cluster. It does this in an attempt at not burdening users with yet another technology to figure; things are bad enough for the hbase noob what with hbase, hdfs, and mapreduce. Part of hbase's management of zk includes being able to see zk configuration in the hbase configuration files. Anything that has the hbase.zookeeper prefix will have its suffix mapped to the corresponding zoo.cfg setting (HBase parses its config. and feeds the relevant zk configurations to zk on start).
  10. Using Java as it’s very readable and shows type information. The same can easily be achieved with Node.js, Perl, or your language of choice. Create - First we create a persistent node to act as a group (parent) for child nodes. Join - Second we create and join ephemeral child nodes to that parent node. Each example creates with “unsafe acl”, so anyone who interacts with ZK can perform an update to that node. An alternative is CREATOR_ALL_ACL where only the creator can control the node.
  11. List Think of use cases (active sessions, active system workers, etc) Delete -1 deletes unconditionally. Can also put a version in here to delete, so it will not delete a node that does not match the version specified. Will broadcast to any clients watching that node. Because of the simplicity of operations on paths, each must catch errors if that node does not exist.
  12. http://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf http://web.stanford.edu/class/cs347/reading/zab.pdf