2. Siphon Usage
3.9 million
EVENTS PER SECOND INGRESS (AVG)
800 TB
Ingestion PER DAY
1,700
PRODUCTION KAFKA BROKERS
10 Sec
99th PERCENTILE LATENCY
KEY CUSTOMER SCENARIOS
Ads Monetization (Fast BI)
O365 Customer Fabric NRT – Tenant & User insights
BingNRT Operational Intelligence
Presto (Fast SML) interactive analysis
Delve Analytics
0
10
20
30
40
50
60
Throughput(inGBps)
Siphon Data Volume (Ingress and Egress)
Volume published (GBps) Volume subscribed (GBps) Total Volume (GBps)
0
5
10
15
20
25
Throughput(eventspersec)Millions
Siphon Events per second (Ingress and Egress)
EPS In Eps Out Total EPS
3. Siphon Architecture
Asia DC
Zookeeper Canary
Kafka
Collector
Agent
Services Data Pull (Agent)
Services Data Push
Device Proxy Services
Consumer
API (Push/
Pull)
Europe DC
Zookeeper Canary
Kafka
US DC
Zookeeper Canary
Kafka
Streaming
Batch
Audit Trail
Open
Source
Microsoft Internal
Siphon
5. The Problem - Unbalanced Disk and Machine usage
Machine 1 Machine 3Machine 2
Disk 1 Disk 1Disk 1
Disk 2 Disk 2Disk 2
Partitions with intense IO
New disk added to the machine
New machine added to the cluster
6. Disk 1
Disk 2
Disk 3
T1-P1
T2-P2 T2-P3
T1-P2 T2-P1
* Disk-3 is a newly added disk and we could move some partitions there.
* If Topic T1 is a higher IO demanding topic, then prefer to put T1-P1 and T1-P2 on separate drives.
Closer look: Imbalance inside a Broker
7. Assuming:
1) All disks are similar
2) All partitions of a topic are equally IO demanding
Solution
Just before broker start
{
Get alphabetically sorted list of all local kafka directories
Put them on the list of disks in a round robin fashion
}
Ensures:
1) Partitions of the same Topic go to different drives
2) All disks get equal share of directories
* This method doesn’t cover all scenarios, but does a decent job. Two heavy throughput partitions could still
end up on the same disk.
Closer look: Imbalance inside a Broker
8. Disk 1
Disk 2
Disk 3
T1-P1 T2-P2
T2-P3T1-P2
T2-P1
* Disk-3 is a newly added disk and we could move some partitions there.
* If Topic T1 is a higher IO demanding topic, then prefer to put T1-P1 and T1-P2 on separate drives.
Closer look: Imbalance inside a Broker solved
10. • The static approach (usually has the following wrong assumptions):
Assuming heterogeneous topics’ partitions will show
homogenous throughput characteristics.
Number of partitions for a topic is in the control of the Cluster
manager (Operations team).
All machines in the cluster are of the same configuration
Closer look: Imbalance across machines
11. Broker-1 Broker-3
T1-P1
Broker-2
T1-P2
T2-P2T2-P1
* Equal partitions across machines isn’t enough to achieve fair load
* Perhaps T3-P1 is very IO intensive
* Perhaps Broker-1 is of low hardware configuration
Closer look: Imbalance across machines
T3-P1
Broker-2 Broker-1
T1-P1
T1-P2
T2-P2
T2-P1
T3-P1
Broker-2
12. • The dynamic approach:
Since a statically defined approach didn’t work,
we need a dynamic approach…
Next, we will discuss the dynamic approach (which I call the
“Adoption Marketplace”)
Closer look: Imbalance across machines
14. An Adoption Marketplace
Broker-1 Broker-3
Adoption Ads
(First come first serve)
Item: Topic1-
Partition1
Requires: 2
MBPS at peak
Broker-2
15. An Adoption Marketplace
• The (POC) logic/tool runs on each
broker independently
• The logic is completely distributed,
but needs co-ordination.
Zookeeper is leveraged here.
16. An Adoption Marketplace
• Advertisement format:
{
Version: 1.0
Item: TopicA-PartitionB,
ResourcesRequired: [
{ResourceName:”X”, ResourceQty: X1},
{ResourceName:”Y”, ResourceQty: Y1},
…
]
}
Example (Advertise to give away Topic1’s partition 1):
{
Version: 1.0
Item: “Topic1-1”,
ResourcesRequired: [
{ResourceName:”PeakMBPS”, ResourceQty: 2}
]
}
Versioning allows future enhancements
to take place seamlessly
17. An Adoption Marketplace
• The Interface
public interface IAdoptionLogic {
// Individual Developers need to implement/perfect this based on their need
Partition findLocalPartitionToGiveOut() throws Exception;
// Individual Developers need to implement/perfect this based on their need
Partition findRemotePartitionToTakeIn()throws Exception;
// This method will be a zookeeper backed implementation out of the box
void advertisePartitionForAdoption(Partition partitionToGiveOut)throws Exception;
// This method will be a zookeeper backed implementation out of the box
void adoptRemotePartition(Partition partitionToTakeIn)throws Exception;
}
18. An Adoption Marketplace
• LOOP START:
Find any local partition to be given away
Post the Ad and wait for its adoption
Find any Ads from others for adoption and check your eligibility
Lock on an Ad and adopt it if you are eligible
GOTO LOOP START
19. Proof of Concept Experiment – The logic
The logic to put/take partitions for adoption was kept straight forward:
1) findLocalPartitionToGiveOut()
If a broker sees that it has more partitions for a topic than the
Ceil((Partitions * Replication-factor) / total brokers in cluster), then it
puts a partition of that topic for adoption
2) adoptRemotePartition()
If a broker has less partitions for a topic than the Ceil((Partitions
* Replication-factor) / total brokers in cluster), and it sees an
advertisement for such a partition, then it tries to adopt it.
22. An Adoption Marketplace
Benefits:
1) Kafka Reassignment command runs continuously throughout the
cluster.
2) Older partitions can spread to the new machines without any
manual operations.
3) The advertisements can be monitored. If there are no adopters, then
it is time to add more machines.
4) The logic to determine which partitions to put for/take in for
adoptions could be constantly improved, and will be dependent on
ones environment and use case.
23. Proof of Concept Experiment – Code
Github:
https://github.com/Microsoft/Cluster-Partition-Rebalancer-For-Kafka