SlideShare a Scribd company logo
1 of 31
Download to read offline
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Tech Deep Dive
Understanding
Broker Load Balancing
Heesung Sohn
Sr. Platform Engineer • StreamNative
A platform engineer at StreamNative based in
the San Francisco Bay Area.
Currently focusing on Pulsar Broker Load
Balancing.
Previously worked on scaling Aurora Mysql
internals for its Serverless features at AWS.
LinkedIn
https://www.linkedin.com/in/heesung-sohn-aa5b
9561/
Heesung Sohn
Sr. Platform Engineer
StreamNative
Understanding Broker Load Balancing
Agenda
Intro: Pulsar Broker Load Balancing
Topic Bundles
Auto Bundle Load Balance Logic
Operation Tips
Future Work
Q&A
Understanding Broker Load Balancing
Load Balancing in Distributed Messaging/Streaming Systems
In distributed messaging/streaming systems:
➔ Messages are grouped under topics
➔ Message pub-sub for a topic is served by a single broker
➔ Topics(or groups of topics) are considered as a “good” load-balance entity
Load balancing refers to efficiently distributing topic messages across brokers.
Topic1:
{msg, msg, msg}
broker3
broker2
broker1
Topic2:
{msg, msg, msg}
Topic3:
{msg, msg, msg}
Topic4:
{msg, msg, msg}
disk
disk
disk
client
client
Understanding Broker Load Balancing
Broker Load Balancing in Pulsar
In Pulsar:
Pulsar separates serving-persistence layers.
➔ Brokers serve topics’ pub-sub sessions.
➔ Brokers read/write messages in Bookies.
Broker Load balancing refers to efficiently distributing topic serving sessions across brokers.
Q: How does Pulsar make the Broker Load Balance efficient?
Bookie1
(disk)
Bookie2
(disk)
Bookie3
(disk)
Topic1:
{msg, msg, msg}
broker3
broker2
broker1
client
Topic2:
{msg, msg, msg}
Topic3:
{msg, msg, msg}
Topic4:
{msg, msg, msg}
Serving Layer Persistence Layer
client
In Pulsar, unlike monolithic systems,
Topic-broker assignments are “flexible” as brokers do not persist messages locally.
➔ New owner brokers simply establish new sessions with the clients.
➔ Minimal client connection jitters. New owner brokers look up bookie metadata.
Q: Is dynamic unloading possible without harming the performance?
A: Pulsar uses dynamic topic rebalancing.
➔ Assign topics to underloaded brokers.
➔ Unload topics from overloaded brokers(high cpu, memory, I/O ...)
Understanding Broker Load Balancing
Idea: Dynamic Topic Rebalancing
A: Yes, Pulsar can seamlessly transfer topic sessions thanks to the serving-persistence separation.
Q: How does Pulsar make the Broker Load Balancing efficient?
When adding more bookies,
messages will ramp up on new
bookies.
Topic messages are segmented,
replicated and distributed to
multiple bookies.
Pulsar Bookies achieve load
balancing by segmentation.
Understanding Broker Load Balancing
Quick Review: How about Bookie Load Balancing?
Ref: Comparing Pulsar and Kafka
Pulsar does not require data
rebalancing.
Seg 1
topic1 topic2
E.g. Topic1’s segments are
replicated 2x and distributed to 4 BK
Understanding Broker Load Balancing
Agenda
Intro: Pulsar Broker Load Balancing
Topic Bundles
Auto Bundle Load Balance Logic
Operation Tips
Future Work
Q&A
Q: Why need this middle layer group, bundles?
Understanding Broker Load Balancing
Topics are balanced to brokers at the bundle level
bundle1
Bundle Key Range(8bits):
[0x00 ,0x80, 0xFF]
bundle2
hash(“topic1”) => 0x0F(bundle key)
topic1
hash(“topic2”) => 0x4F
topic2 topic3
hash(“topic3”) => 0x8F
With the multi-tenancy nature,
Pulsar needs to scale for millions of topics.
Problem: too much to track millions of the topic-level metadata
Solution: topic sharding/bundling.
defaultNumberOfNamespaceBundles = 4
loadBalancerNamespaceMaximumBundles = 128
Topics are grouped into bundles as Broker Load Balancing Unit Topic-Bundle LooKup by Hashed Sharding
Understanding Broker Load Balancing
Agenda
Intro: Pulsar Broker Load Balancing
Bundles
Auto Bundle Load Balance Logic
Operation Tips
Future Work
Q&A
1. Bundle-Broker Assignment
➔ Assign to a new broker when no owner
Understanding Broker Load Balancing
Auto Bundle Load Balance Logics in Pulsar
2. Bundle Unload
➔ Unload bundles from overloaded
to underloaded brokers
3. Bundle Split
➔ Split overloaded bundles
broker1
hash(topic3)
=> bundle1
broker1
hash(topic3)
=> bundle1
broker2
hash(topic3), hash(topic6)
=> bundle1
x
hash(topic3)
=> bundle1
hash(topic6)
=> bundle2
assign
unload
re-assign
split
Child bundles
Understanding Broker Load Balancing
Bundle-Broker Assignment (Assign a bundle to a new broker when no owner)
broker1
Client
topic3
broker2
broker3
Metadata
Store
hash(topic3)
=> bundle2
No
owner
now
Leader
broker
Ask leader to make
an assignment
broker2 is green.
connect broker2 as
the owner
Leader said
I should be
the owner
bundle2=>
bundle2=>
broker2
Randomly
connect
Own
bundle2
Ready to serve!
Q1: How does leader know broker2 is green?
Q2: What’s the strategy to select “a
broker” among the available brokers?
Q1: How does leader know broker2 is green?
Bundle Load Data :
bundle-level msg in/out rates.
Understanding Broker Load Balancing
Leader collects global load info
broker1
broker2
broker3
Metadata
Store
Leader
broker
Load data
Load data
Load data Load data
Broker Load Data:
CPU, memory, and network
throughput in/out rates.
A: The leader broker collects global load info via Metadata Store.
Currently, the leader makes all load balance decisions.
LeastResourceUsageWithWeight(new)
cur_load = max(cpu, mem, network)
load = w * load + (1 - w) * cur_load
avg_load = avg(load) from all brokers’ load
candidates = brokers with load + α < avg_load
Select a random broker among the candidates
with lower resource load than global average.
LeastLongTermMessageRate(default)
load=
If max(cpu, mem, network) <= threshold(85%)
=>ƒ(longTermMsgIn, longTermMsgOut)
Select a random broker among least
long-term msg rate.
Understanding Broker Load Balancing
Bundle-Broker Assignment Strategy
Q2: What’s the strategy to select “a broker” among the available brokers?
A: Pulsar can configure the following strategies(configurable by ModularLoadManagerStrategy.)
Exponential
Moving
Average(EMA)
w=0.9
Understanding Broker Load Balancing
Bundle Unload (Unload bundles from overloaded to underloaded brokers)
broker1
broker2 broker3
Metadata
Store
Leader
broker
Publisher 1 bombarding!
bundle1=>broker1
bundle2=>broker1
Publisher 2
topic1:{msg}
topic3:{msg}
hash(topic1)
=> bundle1
hash(topic3)
=> bundle2
Close
Connections
broker2 is green
Re-connect
/re-assign
unload bundle2 from broker1 (too much load on broker 1)
Q: What’s the strategy to select overloaded brokers and bundles?
bundle1=>broker1
bundle2=>broker1
disable
remove
bundle1=>broker1
bundle2=>
New owner
assigned
bundle1=>broker1
bundle2=>broker2
own
LoadSheddingStrategy:
ThresholdShedder(default)
OverloaadShedder
UniformLoadShedder
Link:
https://pulsar.apache.org/docs/administratio
n-load-balance/#shed-load-automatically
Understanding Broker Load Balancing
ThresholdShedder
cur_load = max(cpu, mem, network)
load = w * load + (1 - w) * cur_load
avg_load = avg(load) from all brokers’ load
candidates = brokers with load + α < avg_load
Select bundles from the candidate brokers,
starting from the high throughput,
enough to reduce load below the threshold..
Bundle Unloading(Shedding) Strategy
Q: What’s the strategy to select overloaded brokers and bundles?
A: Pulsar can configure the following LoadSheddingStrategy.
w = loadBalancerHistoryResourcePercentage
α = loadBalancerBrokerThresholdShedderPercentage
cpu-w = loadBalancerCPUResourceWeight
mem-w = loadBalancerMemoryResourceWeight
network-w = loadBalancerBundleUnloadMinThroughputThreshold
Understanding Broker Load Balancing
Bundle split (Split overloaded bundles)
broker1
broker2
broker3
Metadata
Store
Leader
broker
Publisher 1
bombarding!
Publisher 2
bundle1=>broker1
Split bundle1 (too much traffic on bundle1)
topic1:{msg}
topic3:{msg}
hash(topic1), hash(topic3)
=> bundle1
(Optionally) Unload the child bundles
bundle1-1=>broker1
bundle1-2=>broker1
Split bundle 1
=>
{bundle1-1,bundle 1-2}
Q: What’s the strategy to split overloaded bundles?
Threshold-based Bundle Split Strategy
(when to split)
Split bundles if any resource is beyond
LoadBalancerNamespaceBundle* thresholds.
Defaults
LoadBalancerNamespaceBundleMaxTopics = 1000
LoadBalancerNamespaceBundleMaxSessions = 1000
LoadBalancerNamespaceBundleMaxMsgRate = 30000
LoadBalancerNamespaceBundleMaxBandwidthMbytes = 100
Understanding Broker Load Balancing
Bundle Split Boundary Compute Strategy
(how to split)
RANGE_EQUALLY_DIVIDE_NAME (default):
split to parts with the same hash range size
TOPIC_COUNT_EQUALLY_DIVIDE:
split to parts with the same topic count.
configurable by
DefaultNamespaceBundleSplitAlgorithm.
Bundle Split Strategy
Q: What’s the strategy to split overloaded bundles?
A: Pulsar can configure when and how to splits bundles.
Understanding Broker Load Balancing
RANGE_EQUALLY_DIVIDE_NAME Split Example
Bundle1-1
0x00-0x40
Bundle_Key_Partitions
[0x00 ,0x40, ,0x80, 0xFF]
Bundle1-2
0x40-0x80
hash(“topic1”) => 0x0F
hash(“topic2”) => 0x4F
hash(“topic3”) => 0x8F
topic1 topic2 topic3
Bundle2
0x80-0xFF
Bundle1
0x00-0x80
Bundle_Key_Partitions
[0x00 ,0x80, 0xFF]
Bundle2
0x80-0xFF
hash(“topic1”) => 0x0F
topic1
hash(“topic2”) => 0x4F
topic2 topic3
hash(“topic3”) => 0x8F
Split Bundle
[0x00-0x80] at
0x40
Understanding Broker Load Balancing
Agenda
Intro: Pulsar Broker Load Balancing
Bundles
Auto Bundle Load Balance Logic
Operation Tips
Future Work
Q&A
Understanding Broker Load Balancing
Load Balance Metrics: Useful to monitor Load Balance Input / Output
Type Name Description
Load Balance Input signal pulsar_lb_bandwidth_in_usage Broker bandwidth in usage % out of 100%
Load Balance Input signal pulsar_lb_bandwidth_out_usage Broker bandwidth out usage % out of 100%
Load Balance Input signal pulsar_lb_memory_usage Broker heap usage % out of 100%
Load Balance Input signal pulsar_lb_directMemory_usage Broker dict_memory usage % out of 100%
Load Balance Input signal pulsar_lb_cpu_usage Broker cpu usage % out of 100%
Load Balance Split Output pulsar_lb_bundles_split_count Bundle split counts
Load Balance Unload Output pulsar_lb_unload_bundle_count Bundle unload counts
Load Balance Unload Output pulsar_lb_unload_broker_count Bundle unload broker counts
Load Balance Assignment Output pulsar_topics_count Serving topic counts
Load Balance Assignment Output owned-bundles Bundle ownership cache size by
caffeine_cache_estimated_size
Understanding Broker Load Balancing
Load Balance Dashboard
How to set default bundle counts for new namespaces
# conf/broker.conf
defaultNumberOfNamespaceBundles=4
Also,
# bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters standalone --bundles 16
Config / Admin CLIs to Init Bundles
Understanding Broker Load Balancing
Q:What’s a good number for bundle counts?
Goal: Distribute topics to available brokers as much as possible. bundles > brokers, a good start
E.g.
For a namespace with 1000 topics and 16 brokers,
=> A: 64 bundles could be a good start across 16 brokers.
# when to split bundles (we reviewed this)
LoadBalancerNamespaceBundleMaxTopics = 1000
LoadBalancerNamespaceBundleMaxSessions = 1000
LoadBalancerNamespaceBundleMaxMsgRate = 30000
LoadBalancerNamespaceBundleMaxBandwidthMbytes
= 100
# how to split bundles (we reviewed this)
DefaultNamespaceBundleSplitAlgorithm=
RANGE_EQUALLY_DIVIDE_NAME
Useful Admin CLIs to Check Bundle States
Understanding Broker Load Balancing
1. How to list the bundles in the namespace
# bin/pulsar-admin namespaces bundles my-tenant/my-namespace
⇒ "boundaries" : [ "0x00000000", "0x10000000", …, "0xffffffff" ], "numBundles" : 16
3. How to look up the bundle by topic
# bin/pulsar-admin topics bundle-range persistent://my-tenant/my-namespace/my-topic
=> 0x00000000_0x10000000
4. How to look up the owner broker by topic
# bin/pulsar-admin topics lookup persistent://my-tenant/my-namespace/topic
⇒ pulsar://my-broker-1:6650
2. How to list the topics in the bundle
# bin/pulsar-admin topics list my-tenant/my-namespace --bundle 0x00000000_0x10000000
⇒ persistent://my-tenant/my-namespace/my-topic
Manual Split and Unload
Understanding Broker Load Balancing
Q: Can Admin manually split and unload bundles?
A: Yes. The unloaded bundles will be reloaded to the next available brokers soon.
Let’s check the CLIs for this operations.
How to check bundle load stats
# pulsar-admin --admin-url http://my-broker-x-url:8080 broker-stats load-report
⇒
...
"bundleStats" : {
"my-tenant/my-namespace/0x80000000_0xc0000000" : {
"msgRateIn" : 2100.99
"msgThroughputIn" : 2367100.92
"msgRateOut" : 2100.99
"msgThroughputOut" : 2367100.92
"consumerCount" : 2200,
"producerCount" : 2200,
"topics" : 2300,
"cacheSize" : 107600
}
How to check bundle load stats
Understanding Broker Load Balancing
1. How to manually split and unload the bundles in the namespace
# pulsar-admin namespaces split-bundle --bundle 0x80000000_0xc0000000 -san
range_equally_divide -u tenant/namespace
2. How to manually split and unload the largest bundles in the namespace
# pulsar-admin namespaces split-bundle -b LARGEST -san topic_count_equally_divide -u
tenant/namespace
3. How to unload a bundle
# pulsar-admin namespaces unload tenant/namespace -b 0x80000000_0xc0000000
4. How to unload every bundle in the namespace.
# pulsar-admin namespaces unload tenant/namespace
Manual Split and Unload
Understanding Broker Load Balancing
Understanding Broker Load Balancing
Agenda
Intro: Pulsar Broker Load Balancing
Bundles
Auto Bundle Load Balance Logic
Operation Tips
Future Work
Q&A
Future Work
Understanding Broker Load Balancing
Visibility Improvement:
- More logging: https://github.com/apache/pulsar/pull/16937
PIP-192: Load Balance Improvement Project with new code
- Goal : better load-balancing with more efficient implementation and more Ops commands/info
- Please join Pulsar Slack Channel, broker-load-balance-improvement-project
Understanding Broker Load Balancing
Questions?
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Tech Deep Dive
Understanding
Broker Load Balancing
Heesung Sohn
Sr. Platform Engineer • StreamNative
Thank you!

More Related Content

Similar to Understanding Broker Load Balancing - Pulsar Summit SF 2022

Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingTimothy Spann
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종NAVER D2
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsBenjamin Darfler
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows AzureDamir Dobric
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsFederico Michele Facca
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07timcrack
 
Actual CCDAK Questions with Practice Tests and braindumps
Actual CCDAK Questions with Practice Tests and braindumpsActual CCDAK Questions with Practice Tests and braindumps
Actual CCDAK Questions with Practice Tests and braindumpskillexamsofficial
 
WiMAX implementation in ns3
WiMAX implementation in ns3WiMAX implementation in ns3
WiMAX implementation in ns3Mustafa Khaleel
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...HostedbyConfluent
 
20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docx20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docxMostafaParvin1
 

Similar to Understanding Broker Load Balancing - Pulsar Summit SF 2022 (20)

Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
 
1844 1849
1844 18491844 1849
1844 1849
 
1844 1849
1844 18491844 1849
1844 1849
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 
MQTT with .NET Core
MQTT with .NET CoreMQTT with .NET Core
MQTT with .NET Core
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
StateKeeper Report
StateKeeper ReportStateKeeper Report
StateKeeper Report
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows Azure
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07
 
Actual CCDAK Questions with Practice Tests and braindumps
Actual CCDAK Questions with Practice Tests and braindumpsActual CCDAK Questions with Practice Tests and braindumps
Actual CCDAK Questions with Practice Tests and braindumps
 
WiMAX implementation in ns3
WiMAX implementation in ns3WiMAX implementation in ns3
WiMAX implementation in ns3
 
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
Enhancing Apache Kafka for Large Scale Real-Time Data Pipeline at Tencent | K...
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docx20090814102834_嵌入式C与C++语言精华文章集锦.docx
20090814102834_嵌入式C与C++语言精华文章集锦.docx
 

More from StreamNative

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
 

More from StreamNative (20)

Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
Is Using KoP (Kafka-on-Pulsar) a Good Idea? - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Understanding Broker Load Balancing - Pulsar Summit SF 2022

  • 1. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Tech Deep Dive Understanding Broker Load Balancing Heesung Sohn Sr. Platform Engineer • StreamNative
  • 2. A platform engineer at StreamNative based in the San Francisco Bay Area. Currently focusing on Pulsar Broker Load Balancing. Previously worked on scaling Aurora Mysql internals for its Serverless features at AWS. LinkedIn https://www.linkedin.com/in/heesung-sohn-aa5b 9561/ Heesung Sohn Sr. Platform Engineer StreamNative
  • 3. Understanding Broker Load Balancing Agenda Intro: Pulsar Broker Load Balancing Topic Bundles Auto Bundle Load Balance Logic Operation Tips Future Work Q&A
  • 4. Understanding Broker Load Balancing Load Balancing in Distributed Messaging/Streaming Systems In distributed messaging/streaming systems: ➔ Messages are grouped under topics ➔ Message pub-sub for a topic is served by a single broker ➔ Topics(or groups of topics) are considered as a “good” load-balance entity Load balancing refers to efficiently distributing topic messages across brokers. Topic1: {msg, msg, msg} broker3 broker2 broker1 Topic2: {msg, msg, msg} Topic3: {msg, msg, msg} Topic4: {msg, msg, msg} disk disk disk client client
  • 5. Understanding Broker Load Balancing Broker Load Balancing in Pulsar In Pulsar: Pulsar separates serving-persistence layers. ➔ Brokers serve topics’ pub-sub sessions. ➔ Brokers read/write messages in Bookies. Broker Load balancing refers to efficiently distributing topic serving sessions across brokers. Q: How does Pulsar make the Broker Load Balance efficient? Bookie1 (disk) Bookie2 (disk) Bookie3 (disk) Topic1: {msg, msg, msg} broker3 broker2 broker1 client Topic2: {msg, msg, msg} Topic3: {msg, msg, msg} Topic4: {msg, msg, msg} Serving Layer Persistence Layer client
  • 6. In Pulsar, unlike monolithic systems, Topic-broker assignments are “flexible” as brokers do not persist messages locally. ➔ New owner brokers simply establish new sessions with the clients. ➔ Minimal client connection jitters. New owner brokers look up bookie metadata. Q: Is dynamic unloading possible without harming the performance? A: Pulsar uses dynamic topic rebalancing. ➔ Assign topics to underloaded brokers. ➔ Unload topics from overloaded brokers(high cpu, memory, I/O ...) Understanding Broker Load Balancing Idea: Dynamic Topic Rebalancing A: Yes, Pulsar can seamlessly transfer topic sessions thanks to the serving-persistence separation. Q: How does Pulsar make the Broker Load Balancing efficient?
  • 7. When adding more bookies, messages will ramp up on new bookies. Topic messages are segmented, replicated and distributed to multiple bookies. Pulsar Bookies achieve load balancing by segmentation. Understanding Broker Load Balancing Quick Review: How about Bookie Load Balancing? Ref: Comparing Pulsar and Kafka Pulsar does not require data rebalancing. Seg 1 topic1 topic2 E.g. Topic1’s segments are replicated 2x and distributed to 4 BK
  • 8. Understanding Broker Load Balancing Agenda Intro: Pulsar Broker Load Balancing Topic Bundles Auto Bundle Load Balance Logic Operation Tips Future Work Q&A
  • 9. Q: Why need this middle layer group, bundles? Understanding Broker Load Balancing Topics are balanced to brokers at the bundle level bundle1 Bundle Key Range(8bits): [0x00 ,0x80, 0xFF] bundle2 hash(“topic1”) => 0x0F(bundle key) topic1 hash(“topic2”) => 0x4F topic2 topic3 hash(“topic3”) => 0x8F With the multi-tenancy nature, Pulsar needs to scale for millions of topics. Problem: too much to track millions of the topic-level metadata Solution: topic sharding/bundling. defaultNumberOfNamespaceBundles = 4 loadBalancerNamespaceMaximumBundles = 128 Topics are grouped into bundles as Broker Load Balancing Unit Topic-Bundle LooKup by Hashed Sharding
  • 10. Understanding Broker Load Balancing Agenda Intro: Pulsar Broker Load Balancing Bundles Auto Bundle Load Balance Logic Operation Tips Future Work Q&A
  • 11. 1. Bundle-Broker Assignment ➔ Assign to a new broker when no owner Understanding Broker Load Balancing Auto Bundle Load Balance Logics in Pulsar 2. Bundle Unload ➔ Unload bundles from overloaded to underloaded brokers 3. Bundle Split ➔ Split overloaded bundles broker1 hash(topic3) => bundle1 broker1 hash(topic3) => bundle1 broker2 hash(topic3), hash(topic6) => bundle1 x hash(topic3) => bundle1 hash(topic6) => bundle2 assign unload re-assign split Child bundles
  • 12. Understanding Broker Load Balancing Bundle-Broker Assignment (Assign a bundle to a new broker when no owner) broker1 Client topic3 broker2 broker3 Metadata Store hash(topic3) => bundle2 No owner now Leader broker Ask leader to make an assignment broker2 is green. connect broker2 as the owner Leader said I should be the owner bundle2=> bundle2=> broker2 Randomly connect Own bundle2 Ready to serve! Q1: How does leader know broker2 is green? Q2: What’s the strategy to select “a broker” among the available brokers?
  • 13. Q1: How does leader know broker2 is green? Bundle Load Data : bundle-level msg in/out rates. Understanding Broker Load Balancing Leader collects global load info broker1 broker2 broker3 Metadata Store Leader broker Load data Load data Load data Load data Broker Load Data: CPU, memory, and network throughput in/out rates. A: The leader broker collects global load info via Metadata Store. Currently, the leader makes all load balance decisions.
  • 14. LeastResourceUsageWithWeight(new) cur_load = max(cpu, mem, network) load = w * load + (1 - w) * cur_load avg_load = avg(load) from all brokers’ load candidates = brokers with load + α < avg_load Select a random broker among the candidates with lower resource load than global average. LeastLongTermMessageRate(default) load= If max(cpu, mem, network) <= threshold(85%) =>ƒ(longTermMsgIn, longTermMsgOut) Select a random broker among least long-term msg rate. Understanding Broker Load Balancing Bundle-Broker Assignment Strategy Q2: What’s the strategy to select “a broker” among the available brokers? A: Pulsar can configure the following strategies(configurable by ModularLoadManagerStrategy.) Exponential Moving Average(EMA) w=0.9
  • 15. Understanding Broker Load Balancing Bundle Unload (Unload bundles from overloaded to underloaded brokers) broker1 broker2 broker3 Metadata Store Leader broker Publisher 1 bombarding! bundle1=>broker1 bundle2=>broker1 Publisher 2 topic1:{msg} topic3:{msg} hash(topic1) => bundle1 hash(topic3) => bundle2 Close Connections broker2 is green Re-connect /re-assign unload bundle2 from broker1 (too much load on broker 1) Q: What’s the strategy to select overloaded brokers and bundles? bundle1=>broker1 bundle2=>broker1 disable remove bundle1=>broker1 bundle2=> New owner assigned bundle1=>broker1 bundle2=>broker2 own
  • 16. LoadSheddingStrategy: ThresholdShedder(default) OverloaadShedder UniformLoadShedder Link: https://pulsar.apache.org/docs/administratio n-load-balance/#shed-load-automatically Understanding Broker Load Balancing ThresholdShedder cur_load = max(cpu, mem, network) load = w * load + (1 - w) * cur_load avg_load = avg(load) from all brokers’ load candidates = brokers with load + α < avg_load Select bundles from the candidate brokers, starting from the high throughput, enough to reduce load below the threshold.. Bundle Unloading(Shedding) Strategy Q: What’s the strategy to select overloaded brokers and bundles? A: Pulsar can configure the following LoadSheddingStrategy. w = loadBalancerHistoryResourcePercentage α = loadBalancerBrokerThresholdShedderPercentage cpu-w = loadBalancerCPUResourceWeight mem-w = loadBalancerMemoryResourceWeight network-w = loadBalancerBundleUnloadMinThroughputThreshold
  • 17. Understanding Broker Load Balancing Bundle split (Split overloaded bundles) broker1 broker2 broker3 Metadata Store Leader broker Publisher 1 bombarding! Publisher 2 bundle1=>broker1 Split bundle1 (too much traffic on bundle1) topic1:{msg} topic3:{msg} hash(topic1), hash(topic3) => bundle1 (Optionally) Unload the child bundles bundle1-1=>broker1 bundle1-2=>broker1 Split bundle 1 => {bundle1-1,bundle 1-2} Q: What’s the strategy to split overloaded bundles?
  • 18. Threshold-based Bundle Split Strategy (when to split) Split bundles if any resource is beyond LoadBalancerNamespaceBundle* thresholds. Defaults LoadBalancerNamespaceBundleMaxTopics = 1000 LoadBalancerNamespaceBundleMaxSessions = 1000 LoadBalancerNamespaceBundleMaxMsgRate = 30000 LoadBalancerNamespaceBundleMaxBandwidthMbytes = 100 Understanding Broker Load Balancing Bundle Split Boundary Compute Strategy (how to split) RANGE_EQUALLY_DIVIDE_NAME (default): split to parts with the same hash range size TOPIC_COUNT_EQUALLY_DIVIDE: split to parts with the same topic count. configurable by DefaultNamespaceBundleSplitAlgorithm. Bundle Split Strategy Q: What’s the strategy to split overloaded bundles? A: Pulsar can configure when and how to splits bundles.
  • 19. Understanding Broker Load Balancing RANGE_EQUALLY_DIVIDE_NAME Split Example Bundle1-1 0x00-0x40 Bundle_Key_Partitions [0x00 ,0x40, ,0x80, 0xFF] Bundle1-2 0x40-0x80 hash(“topic1”) => 0x0F hash(“topic2”) => 0x4F hash(“topic3”) => 0x8F topic1 topic2 topic3 Bundle2 0x80-0xFF Bundle1 0x00-0x80 Bundle_Key_Partitions [0x00 ,0x80, 0xFF] Bundle2 0x80-0xFF hash(“topic1”) => 0x0F topic1 hash(“topic2”) => 0x4F topic2 topic3 hash(“topic3”) => 0x8F Split Bundle [0x00-0x80] at 0x40
  • 20. Understanding Broker Load Balancing Agenda Intro: Pulsar Broker Load Balancing Bundles Auto Bundle Load Balance Logic Operation Tips Future Work Q&A
  • 21. Understanding Broker Load Balancing Load Balance Metrics: Useful to monitor Load Balance Input / Output Type Name Description Load Balance Input signal pulsar_lb_bandwidth_in_usage Broker bandwidth in usage % out of 100% Load Balance Input signal pulsar_lb_bandwidth_out_usage Broker bandwidth out usage % out of 100% Load Balance Input signal pulsar_lb_memory_usage Broker heap usage % out of 100% Load Balance Input signal pulsar_lb_directMemory_usage Broker dict_memory usage % out of 100% Load Balance Input signal pulsar_lb_cpu_usage Broker cpu usage % out of 100% Load Balance Split Output pulsar_lb_bundles_split_count Bundle split counts Load Balance Unload Output pulsar_lb_unload_bundle_count Bundle unload counts Load Balance Unload Output pulsar_lb_unload_broker_count Bundle unload broker counts Load Balance Assignment Output pulsar_topics_count Serving topic counts Load Balance Assignment Output owned-bundles Bundle ownership cache size by caffeine_cache_estimated_size
  • 22. Understanding Broker Load Balancing Load Balance Dashboard
  • 23. How to set default bundle counts for new namespaces # conf/broker.conf defaultNumberOfNamespaceBundles=4 Also, # bin/pulsar-admin namespaces create my-tenant/my-namespace --clusters standalone --bundles 16 Config / Admin CLIs to Init Bundles Understanding Broker Load Balancing Q:What’s a good number for bundle counts? Goal: Distribute topics to available brokers as much as possible. bundles > brokers, a good start E.g. For a namespace with 1000 topics and 16 brokers, => A: 64 bundles could be a good start across 16 brokers. # when to split bundles (we reviewed this) LoadBalancerNamespaceBundleMaxTopics = 1000 LoadBalancerNamespaceBundleMaxSessions = 1000 LoadBalancerNamespaceBundleMaxMsgRate = 30000 LoadBalancerNamespaceBundleMaxBandwidthMbytes = 100 # how to split bundles (we reviewed this) DefaultNamespaceBundleSplitAlgorithm= RANGE_EQUALLY_DIVIDE_NAME
  • 24. Useful Admin CLIs to Check Bundle States Understanding Broker Load Balancing 1. How to list the bundles in the namespace # bin/pulsar-admin namespaces bundles my-tenant/my-namespace ⇒ "boundaries" : [ "0x00000000", "0x10000000", …, "0xffffffff" ], "numBundles" : 16 3. How to look up the bundle by topic # bin/pulsar-admin topics bundle-range persistent://my-tenant/my-namespace/my-topic => 0x00000000_0x10000000 4. How to look up the owner broker by topic # bin/pulsar-admin topics lookup persistent://my-tenant/my-namespace/topic ⇒ pulsar://my-broker-1:6650 2. How to list the topics in the bundle # bin/pulsar-admin topics list my-tenant/my-namespace --bundle 0x00000000_0x10000000 ⇒ persistent://my-tenant/my-namespace/my-topic
  • 25. Manual Split and Unload Understanding Broker Load Balancing Q: Can Admin manually split and unload bundles? A: Yes. The unloaded bundles will be reloaded to the next available brokers soon. Let’s check the CLIs for this operations.
  • 26. How to check bundle load stats # pulsar-admin --admin-url http://my-broker-x-url:8080 broker-stats load-report ⇒ ... "bundleStats" : { "my-tenant/my-namespace/0x80000000_0xc0000000" : { "msgRateIn" : 2100.99 "msgThroughputIn" : 2367100.92 "msgRateOut" : 2100.99 "msgThroughputOut" : 2367100.92 "consumerCount" : 2200, "producerCount" : 2200, "topics" : 2300, "cacheSize" : 107600 } How to check bundle load stats Understanding Broker Load Balancing
  • 27. 1. How to manually split and unload the bundles in the namespace # pulsar-admin namespaces split-bundle --bundle 0x80000000_0xc0000000 -san range_equally_divide -u tenant/namespace 2. How to manually split and unload the largest bundles in the namespace # pulsar-admin namespaces split-bundle -b LARGEST -san topic_count_equally_divide -u tenant/namespace 3. How to unload a bundle # pulsar-admin namespaces unload tenant/namespace -b 0x80000000_0xc0000000 4. How to unload every bundle in the namespace. # pulsar-admin namespaces unload tenant/namespace Manual Split and Unload Understanding Broker Load Balancing
  • 28. Understanding Broker Load Balancing Agenda Intro: Pulsar Broker Load Balancing Bundles Auto Bundle Load Balance Logic Operation Tips Future Work Q&A
  • 29. Future Work Understanding Broker Load Balancing Visibility Improvement: - More logging: https://github.com/apache/pulsar/pull/16937 PIP-192: Load Balance Improvement Project with new code - Goal : better load-balancing with more efficient implementation and more Ops commands/info - Please join Pulsar Slack Channel, broker-load-balance-improvement-project
  • 30. Understanding Broker Load Balancing Questions?
  • 31. Pulsar Summit San Francisco Hotel Nikko August 18 2022 Tech Deep Dive Understanding Broker Load Balancing Heesung Sohn Sr. Platform Engineer • StreamNative Thank you!