SlideShare a Scribd company logo
Apache ZooKeeper
Leo’s Notes
• Those slides are Leopold Gault’s notes, when reading the Overview section of the
doc: https://zookeeper.apache.org/doc/r3.4.8/zookeeperOver.html
• I am not a ZooKeeper expert; those notes are just my understanding of the
aforementioned section of the doc
In a nutshell
What problem it addresses
(purple = Leo’s assumptions)
Distributed applications such as Hadoop and Kafka need to be able to share their configuration, and
coordinate their tasks.
The nodes of a distributed application would therefore need to use a “common TODO list”, that
each server could edit (a bit like a Google Doc). On this “common TODO list”, they would write the
tasks they should distribute between each other. They’d use it to let the other nodes know when
they pick up a task, and let them know which task they have already completed.
This common TODO list could be centralized on a single machine, but it would end up being a single
point of failure for our distributed app. So this TODO list needs to be distributed too.
And sure, instead of using a common TODO list, the servers of the distributed app could use a
messaging service, in order to let each other know what they are doing. Actually, that’s probably
what would happen under the hood of such “common TODO list”.
ZooKeeper implement such “common TODO list”, and provides developers a very simple API to read
and write in this “common TODO list”.
What it is
Logical view
“under the hood” view
ZooKeeper allows to maintain a distributed store (this store is
a tree of nodes), and provides a very simple programming
interface (client side API) to interact with this store.
distributed store
API
What it is for
This store is meant to be used by distributed apps (e.g. Kafka
cluster, Hadoop cluster, etc.) for coordination of tasks and
synchronization of configuration.
My understanding is that each node of the distributed app:
• writes in the Zookeeper store which task it is going to work on,
• may notify others when it has completed its task (e.g. by releasing a “lock” in
the store –see last slide),
• reads on what task the others are working,
• reads and writes its configuration, and modifies the other’s conf.
But it is up to the distributed app to implement the aforementioned usecases. All
that the Zookeeper service provides is a distributed store with a simple client-side
API to interact with it. The ZooKeeper clients (i.e. distributed apps) do whatever
they want from this distributed-store service.
The tree is read from memory, so read operations are fast, and the size of the store is
limited (by the size of the RAM). This makes it especially adapted for the aforementioned
use.
A distributed app (e.g. Kafka cluster, Hadoop cluster),
distributed on 4 nodes.
ZooKeeper Service
“under the hood” view
Logical view
API API API API
distributed store
More details
Simple API
API
The API is very simple, and supports only these operations:
 create: creates a znode at a location in the tree
 delete: deletes a znode
 exists: tests if a znode exists at a location
 get data: reads the data from a znode
 set data: writes data to a znode
 get children: retrieves a list of children of a znode
 Sync: waits for data to be propagated
Z-node tree
• ACL
• Timestamp
• Version nb
• Data
• sub-znodes
ACL
Timestamp
Version number
Data
ACL
Timestamp
Version number
Data
Sub-znodes
ACL
Timestamp
Version number
Data
ACL
Timestamp
Version number
Data
ACL
Timestamp
Version number
Data
/
/znode1 /znode2
/znode3
/znode3/sub1 /znode3/sub2
Unlike typical trees, nodes can also contain data, and
leaves can also become nodes.
ACL: Access Control List
What all ZooKeeper servers maintain
Transaction Log
snapshots
I think
ZooKeeper server
ZooKeeper Service
The ZooKeeper Service (cluster)
Servers all know about each other’s existence, and a majority of them have to be running for the ZooKeeper service to work.
Leader
The messaging layer takes care of:
 replacing leaders on failures
 and syncing followers with leaders
All client connections go to followers
Every client is connected to only one server at a time. Those connections are TCP connections (-is there no
Application-level protocol above ? I think not.).
When the client is connected to the server a session is established. This session is kept open by sending heartbeat
messages to the server. After some amount of idle time, if the server has not heard from the client then it will close
its session.
On the other client side, if the connection break, the client will establish a session with another follower.
ZooKeeper Service
Read operations are directly answered to, using the in-memory replica of the store
Leader
Read
request
Direct
response
client
API
ZooKeeper Service
All write operations must be negotiated with the leader
Leader
Write
request
negotiation negotiation
Acknowledgement?
1
2
3 3
4
5
API
ZooKeeper Service
Roles of clients
Leader
A client has a TCP connection with only one server;
and this server can probably only be a follower.
Clients :
 send requests (for reads, writes, or to set up watch events ),
 get responses,
 get watch events,
 send heart beats (to keep the session alive with the server).
 If the TCP connection to the server breaks, the client will connect to a different server.
API API API
About watch events
Every client can set up an event listener on a specific znode, to be notified when the
znode’s state changes.
ZooKeeper service
Logical view
API
About Ephemeral znodes
“under the hood” view
Logical view
Leader
Ephemeral znode
Alive, while the TCP
connection is maintained
with the client who created it
Client who created the
ephemeral znode
API
Leo’s assumptions about some
use-cases
This section is no longer based on the documentation
Usecase: implementation of a lock
With an ephemeral znode and watches
“under the hood” view
Logical view
Leader
Client who created the
ephemeral lock
Lock znode (Ephemeral)
Data = “locked_znodes:{/znode1, /znode2}”
API
API
API
TCP
connection
TCP
connection
TCP
connection
Example of use: using ZooKeeper as a broker enabling the transactional aspect of
writes on a cluster of a NoSQL DB that doesn’t support transactions
ZooKeeper Service
“under the hood” view
Logical view
API API API API
distributed store
API
Cluster of a NoSQL DB instance
An application willing to
perform a transaction
in the NoSQL DB
instance
Add a znode (containing the write instructions
for the NoSQL DB), and wait for the
acknowledgement of the propagation of this
znode in the ZooKeeper Cluster
Acknowledgement of the
propagation
1
2
4Each NoSQL node applies the write instructions contained in the znode,
without making sure before committing that the other NoSQL nodes
successfully simulated the instructions.

More Related Content

What's hot

Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Zookeeper Architecture
Zookeeper ArchitectureZookeeper Architecture
Zookeeper Architecture
Prasad Wali
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streams
Igor Mielientiev
 
Developing distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka ClusterDeveloping distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka Cluster
Konstantin Tsykulenko
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
Gal Marder
 
Designing for Distributed Systems with Reactor and Reactive Streams
Designing for Distributed Systems with Reactor and Reactive StreamsDesigning for Distributed Systems with Reactor and Reactive Streams
Designing for Distributed Systems with Reactor and Reactive Streams
Stéphane Maldini
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
Yaroslav Tkachenko
 
What’s expected in Java 9
What’s expected in Java 9What’s expected in Java 9
What’s expected in Java 9
Gal Marder
 
Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2
Knoldus Inc.
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Yaroslav Tkachenko
 
Nio
NioNio
Nio
nextlib
 
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Data Con LA
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
J On The Beach
 
Why Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesWhy Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For Microservices
Yaroslav Tkachenko
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
Dean Wampler
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
Abhinav Singh
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
Kevin Webber
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Lightbend
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
Roland Kuhn
 

What's hot (20)

Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Zookeeper Architecture
Zookeeper ArchitectureZookeeper Architecture
Zookeeper Architecture
 
How to manage large amounts of data with akka streams
How to manage large amounts of data with akka streamsHow to manage large amounts of data with akka streams
How to manage large amounts of data with akka streams
 
Developing distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka ClusterDeveloping distributed applications with Akka and Akka Cluster
Developing distributed applications with Akka and Akka Cluster
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
Designing for Distributed Systems with Reactor and Reactive Streams
Designing for Distributed Systems with Reactor and Reactive StreamsDesigning for Distributed Systems with Reactor and Reactive Streams
Designing for Distributed Systems with Reactor and Reactive Streams
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 
What’s expected in Java 9
What’s expected in Java 9What’s expected in Java 9
What’s expected in Java 9
 
Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2Introduction to Apache Kafka- Part 2
Introduction to Apache Kafka- Part 2
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Nio
NioNio
Nio
 
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Why Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For MicroservicesWhy Actor-Based Systems Are The Best For Microservices
Why Actor-Based Systems Are The Best For Microservices
 
Reactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka StreamsReactive Streams 1.0 and Akka Streams
Reactive Streams 1.0 and Akka Streams
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
Build Real-Time Streaming ETL Pipelines With Akka Streams, Alpakka And Apache...
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 

Similar to Leo's Notes about Apache Kafka

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
Anh Le
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Dibyendu Bhattacharya
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
Léopold Gault
 
Node js internal
Node js internalNode js internal
Node js internal
Chinh Ngo Nguyen
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
Data Con LA
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf
 
zookeeperProgrammers
zookeeperProgrammerszookeeperProgrammers
zookeeperProgrammers
Hiroshi Ono
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
jeetendra mandal
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
Henning Spjelkavik
 
Parsing XML in J2ME
Parsing XML in J2MEParsing XML in J2ME
Parsing XML in J2ME
Rohan Chandane
 
Reactive programming intro
Reactive programming introReactive programming intro
Reactive programming intro
Ahmed Ehab AbdulAziz
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
Peter Lawrey
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Binu George
 
Akka for big data developers
Akka for big data developersAkka for big data developers
Akka for big data developers
Taras Fedorov
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
Peter Lawrey
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
Ulf Wendel
 

Similar to Leo's Notes about Apache Kafka (20)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 
Node js internal
Node js internalNode js internal
Node js internal
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
zookeeperProgrammers
zookeeperProgrammerszookeeperProgrammers
zookeeperProgrammers
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
 
Parsing XML in J2ME
Parsing XML in J2MEParsing XML in J2ME
Parsing XML in J2ME
 
Reactive programming intro
Reactive programming introReactive programming intro
Reactive programming intro
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API ExamplesApache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
Apache Zookeeper Explained: Tutorial, Use Cases and Zookeeper Java API Examples
 
Akka for big data developers
Akka for big data developersAkka for big data developers
Akka for big data developers
 
Low level java programming
Low level java programmingLow level java programming
Low level java programming
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 

More from Léopold Gault

OAuth OpenID Connect
OAuth OpenID ConnectOAuth OpenID Connect
OAuth OpenID Connect
Léopold Gault
 
SAML
SAMLSAML
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
Léopold Gault
 
NoSQL - Leo's notes
NoSQL - Leo's notesNoSQL - Leo's notes
NoSQL - Leo's notes
Léopold Gault
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 Days
Léopold Gault
 
Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c
Léopold Gault
 

More from Léopold Gault (6)

OAuth OpenID Connect
OAuth OpenID ConnectOAuth OpenID Connect
OAuth OpenID Connect
 
SAML
SAMLSAML
SAML
 
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
 
NoSQL - Leo's notes
NoSQL - Leo's notesNoSQL - Leo's notes
NoSQL - Leo's notes
 
Leo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 DaysLeo's notes - Oracle DBA 2 Days
Leo's notes - Oracle DBA 2 Days
 
Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c Application Continuity with Oracle DB 12c
Application Continuity with Oracle DB 12c
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

Leo's Notes about Apache Kafka

  • 1. Apache ZooKeeper Leo’s Notes • Those slides are Leopold Gault’s notes, when reading the Overview section of the doc: https://zookeeper.apache.org/doc/r3.4.8/zookeeperOver.html • I am not a ZooKeeper expert; those notes are just my understanding of the aforementioned section of the doc
  • 2.
  • 4. What problem it addresses (purple = Leo’s assumptions) Distributed applications such as Hadoop and Kafka need to be able to share their configuration, and coordinate their tasks. The nodes of a distributed application would therefore need to use a “common TODO list”, that each server could edit (a bit like a Google Doc). On this “common TODO list”, they would write the tasks they should distribute between each other. They’d use it to let the other nodes know when they pick up a task, and let them know which task they have already completed. This common TODO list could be centralized on a single machine, but it would end up being a single point of failure for our distributed app. So this TODO list needs to be distributed too. And sure, instead of using a common TODO list, the servers of the distributed app could use a messaging service, in order to let each other know what they are doing. Actually, that’s probably what would happen under the hood of such “common TODO list”. ZooKeeper implement such “common TODO list”, and provides developers a very simple API to read and write in this “common TODO list”.
  • 5. What it is Logical view “under the hood” view ZooKeeper allows to maintain a distributed store (this store is a tree of nodes), and provides a very simple programming interface (client side API) to interact with this store. distributed store API
  • 6. What it is for This store is meant to be used by distributed apps (e.g. Kafka cluster, Hadoop cluster, etc.) for coordination of tasks and synchronization of configuration. My understanding is that each node of the distributed app: • writes in the Zookeeper store which task it is going to work on, • may notify others when it has completed its task (e.g. by releasing a “lock” in the store –see last slide), • reads on what task the others are working, • reads and writes its configuration, and modifies the other’s conf. But it is up to the distributed app to implement the aforementioned usecases. All that the Zookeeper service provides is a distributed store with a simple client-side API to interact with it. The ZooKeeper clients (i.e. distributed apps) do whatever they want from this distributed-store service. The tree is read from memory, so read operations are fast, and the size of the store is limited (by the size of the RAM). This makes it especially adapted for the aforementioned use. A distributed app (e.g. Kafka cluster, Hadoop cluster), distributed on 4 nodes. ZooKeeper Service “under the hood” view Logical view API API API API distributed store
  • 8. Simple API API The API is very simple, and supports only these operations:  create: creates a znode at a location in the tree  delete: deletes a znode  exists: tests if a znode exists at a location  get data: reads the data from a znode  set data: writes data to a znode  get children: retrieves a list of children of a znode  Sync: waits for data to be propagated
  • 9. Z-node tree • ACL • Timestamp • Version nb • Data • sub-znodes ACL Timestamp Version number Data ACL Timestamp Version number Data Sub-znodes ACL Timestamp Version number Data ACL Timestamp Version number Data ACL Timestamp Version number Data / /znode1 /znode2 /znode3 /znode3/sub1 /znode3/sub2 Unlike typical trees, nodes can also contain data, and leaves can also become nodes. ACL: Access Control List
  • 10. What all ZooKeeper servers maintain Transaction Log snapshots I think ZooKeeper server
  • 11. ZooKeeper Service The ZooKeeper Service (cluster) Servers all know about each other’s existence, and a majority of them have to be running for the ZooKeeper service to work. Leader The messaging layer takes care of:  replacing leaders on failures  and syncing followers with leaders
  • 12. All client connections go to followers Every client is connected to only one server at a time. Those connections are TCP connections (-is there no Application-level protocol above ? I think not.). When the client is connected to the server a session is established. This session is kept open by sending heartbeat messages to the server. After some amount of idle time, if the server has not heard from the client then it will close its session. On the other client side, if the connection break, the client will establish a session with another follower.
  • 13. ZooKeeper Service Read operations are directly answered to, using the in-memory replica of the store Leader Read request Direct response client API
  • 14. ZooKeeper Service All write operations must be negotiated with the leader Leader Write request negotiation negotiation Acknowledgement? 1 2 3 3 4 5 API
  • 15. ZooKeeper Service Roles of clients Leader A client has a TCP connection with only one server; and this server can probably only be a follower. Clients :  send requests (for reads, writes, or to set up watch events ),  get responses,  get watch events,  send heart beats (to keep the session alive with the server).  If the TCP connection to the server breaks, the client will connect to a different server. API API API
  • 16. About watch events Every client can set up an event listener on a specific znode, to be notified when the znode’s state changes. ZooKeeper service Logical view API
  • 17. About Ephemeral znodes “under the hood” view Logical view Leader Ephemeral znode Alive, while the TCP connection is maintained with the client who created it Client who created the ephemeral znode API
  • 18. Leo’s assumptions about some use-cases This section is no longer based on the documentation
  • 19. Usecase: implementation of a lock With an ephemeral znode and watches “under the hood” view Logical view Leader Client who created the ephemeral lock Lock znode (Ephemeral) Data = “locked_znodes:{/znode1, /znode2}” API API API TCP connection TCP connection TCP connection
  • 20. Example of use: using ZooKeeper as a broker enabling the transactional aspect of writes on a cluster of a NoSQL DB that doesn’t support transactions ZooKeeper Service “under the hood” view Logical view API API API API distributed store API Cluster of a NoSQL DB instance An application willing to perform a transaction in the NoSQL DB instance Add a znode (containing the write instructions for the NoSQL DB), and wait for the acknowledgement of the propagation of this znode in the ZooKeeper Cluster Acknowledgement of the propagation 1 2 4Each NoSQL node applies the write instructions contained in the znode, without making sure before committing that the other NoSQL nodes successfully simulated the instructions.

Editor's Notes

  1. One of the design goals of ZooKeeper is provide a very simple programming interface. As a result, it supports only these operations: create creates a znode at a location in the tree delete deletes a znode exists tests if a znode exists at a location get data reads the data from a znode set data writes data to a znode get children retrieves a list of children of a znode sync waits for data to be propagated  
  2. One of the design goals of ZooKeeper is provide a very simple programming interface. As a result, it supports only these operations: create creates a znode at a location in the tree delete deletes a znode exists tests if a znode exists at a location get data reads the data from a znode set data writes data to a znode get children retrieves a list of children of a znode sync waits for data to be propagated  
  3. The NoSQL DB would read the operations to perform from Zookeeper’s distributed store. 1) “transactional aspect” between the Zookeeper client (the app that wants to write on the NoSQL DB in a transactional fashion), and the Zookeeper service: The app writes a znode containing the details of the transaction to commit. The app uses ZooKeeper’s Sync primitive to wait for the znode to be propagated (throughout the ZooKeeper cluster) before considering the transaction as committed.   2) “transactional aspect” when it comes to applying the change between a cluster of NoSQL nodes: Within ZooKeeper’s distributed store, write operations are negotiated between the leader and the followers. I guess this negotiation can be considered as transactional. Indeed, I suppose that a write is rolled back if a majority of followers disagree with this operation.   Thus all the nodes of our NoSQL DB cluster can safely apply the instructions from ZooKeeper’s store, without having to make sure <the other NoSQL nodes successfully applied those instructions> before committing.