Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Thoughts on Kafka
Capacity Planning
Jamie Alquiza
Sr. Software Engineer
– Multi-petabyte footprint
– 10s of GB/s in sustained bandwidth
– Globally distributed infrastructure
– Continuous growth
...
Motivation
Poor utilization at
scale is $$$🔥
1
Poor utilization at
scale is $$$🔥
1 2
Desire for
predictable
performance ⚡
Kafka Resource
Consumption
CPU
- Message Rate
- Compression
- Compaction (if
used)
Kafka Resource Consumption
Memory Disk Network
CPU
- Message Rate
- Compression
- Compaction (if
used)
Kafka Resource Consumption
Memory
- Efficient, Steady
Heap Usage
- ...
CPU
- Message Rate
- Compression
- Compaction (if
used)
Kafka Resource Consumption
Memory
- Efficient, Steady
Heap Usage
- ...
CPU
- Message Rate
- Compression
- Compaction (if
used)
Kafka Resource Consumption
Memory
- Efficient, Steady
Heap Usage
- ...
Kafka Makes
Capacity Planning
Easy
Through the lens of the Universal
Scalability Law:
- low contention, crosstalk
- no complex queries
Kafka Makes Capacity P...
Through the lens of the Universal
Scalability Law:
- low contention, crosstalk
- no complex queries
Exposes mostly bandwid...
Kafka Makes
Capacity Planning
Hard
The default tools weren’t made for
scaling:
- reassign-partitions focused on
simple partition placement
Kafka Makes Capaci...
The default tools weren’t made for
scaling:
- reassign-partitions focused on
simple partition placement
No administrative ...
A Scaling Model
Created Kafka-Kit (open-source):
- topicmappr for intelligent
partition placement
- registry (WIP): a Kafka
gRPC/HTTP API
...
Created Kafka-Kit (open-source):
- topicmappr for intelligent
partition placement
- registry (WIP): a Kafka
gRPC/HTTP API
...
A Scaling Model
– topicmappr builds optimal
partition -> broker pool
mappings
A Scaling Model
– topicmappr builds optimal
partition -> broker pool
mappings
– topic/pool sets are scaled
individually
A Scaling Model
– topicmappr builds optimal
partition -> broker pool
mappings
– topic/pool sets are scaled
individually
– ...
A large cluster is
composed of
- dozens of pools
- hundreds of brokers
Sizing Pools
Sizing Pools
- Storage utilization is targeted at 60-80%
depending on topic growth rate
Sizing Pools
- Storage utilization is targeted at 60-80%
depending on topic growth rate
- Network capacity depends on seve...
Sizing Pools
- Storage utilization is targeted at 60-80%
depending on topic growth rate
- Network capacity depends on seve...
Sizing Pools
Determining broker counts
Sizing Pools
Determining broker counts
storage: n = fullRetention / (storagePerNode * 0.8)
Sizing Pools
Determining broker counts
storage: n = fullRetention / (storagePerNode * 0.8)
network: n = consumerDemand / (...
Sizing Pools
Determining broker counts
storage: n = fullRetention / (storagePerNode * 0.8)
network: n = consumerDemand / (...
Sizing Pools
(we do a pretty good job at actually hitting this)
Instance Types
Instance Types
- If there’s a huge delta between counts required for
network vs storage: probably the wrong type
Instance Types
- If there’s a huge delta between counts required for
network vs storage: probably the wrong type
- Remembe...
Instance Types
- If there’s a huge delta between counts required for
network vs storage: probably the wrong type
- Remembe...
Instance Types (AWS)
d2: the spinning rust is actually great
Good for:
- Storage/$
- Retention biased workloads
Problems:
...
Instance Types (AWS)
h1: a modernized d2?
Good for:
- Storage/$
- Balanced, lower retention / high throughput workloads
Pr...
Instance Types (AWS)
i3: bandwidth monster
Good for:
- Low MTTRs
- Concurrent i/o outside the page cache
Problems:
- stora...
Instance Types (AWS)
- r4, c5, etc + gp2 EBS
It actually works well; EBS perf isn’t a problem
Instance Types (AWS)
- r4, c5, etc + gp2 EBS
It actually works well; EBS perf isn’t a problem
Problems:
- low EBS channel ...
Data Placement
Data Placement
topicmappr optimizes for:
- maximum leadership distribution
Data Placement
topicmappr optimizes for:
- maximum leadership distribution
- replica rack.id isolation
Data Placement
topicmappr optimizes for:
- maximum leadership distribution
- replica rack.id isolation
- maximum replica l...
Data Placement
Maximum replica list entropy(?)
“For all partitions that a given broker holds,
ensuring that the partition ...
Data Placement
Maximum replica list entropy
It’s possible to have maximal partition distribution
but a low number of uniqu...
Data Placement
Maximum replica list entropy
It’s possible to have maximal partition distribution
but a low number of uniqu...
Data Placement
Maximum replica list entropy!
- topicmappr expresses this as node degree
distribution
Data Placement
Maximum replica list entropy!
- topicmappr expresses this as node degree
distribution
- broker-to-broker re...
Data Placement
Maximum replica list entropy!
- topicmappr expresses this as node degree
distribution
- broker-to-broker re...
Data Placement
A graph of replicas
Given the following partition replica sets:
p0: [1, 2, 3]
p1: [ 2, 3, 4]
Data Placement
A graph of replicas
Given the following partition replica sets:
p0: [1, 2, 3]
p1: [ 2, 3, 4]
Broker 3’s adj...
Data Placement
A graph of replicas
Given the following partition replica sets:
p0: [1, 2, 3]
p1: [ 2, 3, 4]
Broker 3’s adj...
Data Placement
Maximizing replica list entropy (is good)
Data Placement
Maximizing replica list entropy (is good)
In broker failure/replacements:
- probabilistically increases rep...
Data Placement
topicmappr optimizes for:
- maximum leadership distribution ✅
- replica rack.id isolation ✅
- maximum repli...
Maintaining Pools
Maintaining Pools
Most common tasks:
- ensuring storage balance
- simple broker replacements
Maintaining Pools
Most common tasks:
- ensuring storage balance
- simple broker replacements
Both of these are (also) done...
Maintaining Pools
Broker storage balance
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
Maintaining Pools
Broker storage balance
Maintaining Pools
Broker storage balance
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
- plans rel...
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
- plans rel...
Maintaining Pools
Broker storage balance
- finds offload candidates: n distance below
harmonic mean storage free
- plans rel...
Maintaining Pools
Broker storage balance (results)
Maintaining Pools
Broker replacements
Maintaining Pools
Broker replacements
- When a single broker fails, how is a replacement
chosen?
Maintaining Pools
Broker replacements
- When a single broker fails, how is a replacement
chosen?
- Goal: retain any previo...
Maintaining Pools
Broker replacements
- When a single broker fails, how is a replacement
chosen?
- Goal: retain any previo...
Maintaining Pools
Broker replacements
- topicmappr can be provided several hot spares
from varying AZs (rack.id)
Maintaining Pools
Broker replacements
- topicmappr can be provided several hot spares
from varying AZs (rack.id)
- infers ...
Maintaining Pools
Broker replacements - inferring replacements
- traverse all ISRs, build a set of all rack.ids:
G = {1a,1...
Maintaining Pools
Broker replacements - inferring replacements
- traverse all ISRs, build a set of all rack.ids:
G = {1a,1...
Maintaining Pools
Broker replacements - inferring replacements
- Build a set of suitable rack.ids to choose from:
S = { x ...
Maintaining Pools
Broker replacements - inferring replacements
- Build a set of suitable rack.ids to choose from:
S = { x ...
Maintaining Pools
Broker replacements - inferring replacements
Outcome:
- Keeps brokers bound to specific pools
- Simple re...
Scaling Pools
Scaling Pools
When: >90% storage utilization in 48h
Scaling Pools
How:
- add brokers to pool
- run a rebalance
- autothrottle takes over
autothrottle is a service
that dynamically manages
replication rates
Scaling Pools
Increasing capacity also improves storage balance
What’s Next
What’s Next
- precursor to fully automated capacity management
What’s Next
- precursor to fully automated capacity management
- continued growth, dozens more clusters
What’s Next
- precursor to fully automated capacity management
- continued growth, dozens more clusters
- new infrastructu...
What’s Next
- precursor to fully automated capacity management
- continued growth, dozens more clusters
- new infrastructu...
Thank you
Jamie Alquiza
Sr. Software Engineer
twitter.com/jamiealquiza
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

3

Share

Thoughts on kafka capacity planning

Download to read offline

This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository.

https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Thoughts on kafka capacity planning

  1. 1. Thoughts on Kafka Capacity Planning Jamie Alquiza Sr. Software Engineer
  2. 2. – Multi-petabyte footprint – 10s of GB/s in sustained bandwidth – Globally distributed infrastructure – Continuous growth A Lot of Kafka
  3. 3. Motivation
  4. 4. Poor utilization at scale is $$$🔥 1
  5. 5. Poor utilization at scale is $$$🔥 1 2 Desire for predictable performance ⚡
  6. 6. Kafka Resource Consumption
  7. 7. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory Disk Network
  8. 8. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory - Efficient, Steady Heap Usage - Page Cache Disk Network
  9. 9. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory - Efficient, Steady Heap Usage - Page Cache Disk - Bandwidth - Storage Capacity Network
  10. 10. CPU - Message Rate - Compression - Compaction (if used) Kafka Resource Consumption Memory - Efficient, Steady Heap Usage - Page Cache Disk - Bandwidth - Storage Capacity Network - Consumers - Replication
  11. 11. Kafka Makes Capacity Planning Easy
  12. 12. Through the lens of the Universal Scalability Law: - low contention, crosstalk - no complex queries Kafka Makes Capacity Planning Easy
  13. 13. Through the lens of the Universal Scalability Law: - low contention, crosstalk - no complex queries Exposes mostly bandwidth problems: - highly sequential, batched ops - primary workload is streaming the reads/writes of bytes Kafka Makes Capacity Planning Easy
  14. 14. Kafka Makes Capacity Planning Hard
  15. 15. The default tools weren’t made for scaling: - reassign-partitions focused on simple partition placement Kafka Makes Capacity Planning Hard
  16. 16. The default tools weren’t made for scaling: - reassign-partitions focused on simple partition placement No administrative API: - no endpoint to inspect or manipulate resources Kafka Makes Capacity Planning Hard
  17. 17. A Scaling Model
  18. 18. Created Kafka-Kit (open-source): - topicmappr for intelligent partition placement - registry (WIP): a Kafka gRPC/HTTP API A Scaling Model
  19. 19. Created Kafka-Kit (open-source): - topicmappr for intelligent partition placement - registry (WIP): a Kafka gRPC/HTTP API Defined a simple workload pattern: - topics are bound to specific broker sets (“pools”) - multiple pools/cluster - primary drivers: disk capacity & network bandwidth A Scaling Model
  20. 20. A Scaling Model – topicmappr builds optimal partition -> broker pool mappings
  21. 21. A Scaling Model – topicmappr builds optimal partition -> broker pool mappings – topic/pool sets are scaled individually
  22. 22. A Scaling Model – topicmappr builds optimal partition -> broker pool mappings – topic/pool sets are scaled individually – topicmappr handles repairs, storage rebalancing, pool expansion
  23. 23. A large cluster is composed of - dozens of pools - hundreds of brokers
  24. 24. Sizing Pools
  25. 25. Sizing Pools - Storage utilization is targeted at 60-80% depending on topic growth rate
  26. 26. Sizing Pools - Storage utilization is targeted at 60-80% depending on topic growth rate - Network capacity depends on several factors
  27. 27. Sizing Pools - Storage utilization is targeted at 60-80% depending on topic growth rate - Network capacity depends on several factors - consumer demand + - MTTR targets (20-40% headroom for replication)
  28. 28. Sizing Pools Determining broker counts
  29. 29. Sizing Pools Determining broker counts storage: n = fullRetention / (storagePerNode * 0.8)
  30. 30. Sizing Pools Determining broker counts storage: n = fullRetention / (storagePerNode * 0.8) network: n = consumerDemand / (bwPerNode * 0.6)
  31. 31. Sizing Pools Determining broker counts storage: n = fullRetention / (storagePerNode * 0.8) network: n = consumerDemand / (bwPerNode * 0.6) pool size = max(ceil(storage), ceil(network))
  32. 32. Sizing Pools (we do a pretty good job at actually hitting this)
  33. 33. Instance Types
  34. 34. Instance Types - If there’s a huge delta between counts required for network vs storage: probably the wrong type
  35. 35. Instance Types - If there’s a huge delta between counts required for network vs storage: probably the wrong type - Remember: sequential, bandwidth-bound workloads
  36. 36. Instance Types - If there’s a huge delta between counts required for network vs storage: probably the wrong type - Remember: sequential, bandwidth-bound workloads - AWS: d2, i3, h1 class
  37. 37. Instance Types (AWS) d2: the spinning rust is actually great Good for: - Storage/$ - Retention biased workloads Problems: - Disk bw far exceeds network - Long MTTRs
  38. 38. Instance Types (AWS) h1: a modernized d2? Good for: - Storage/$ - Balanced, lower retention / high throughput workloads Problems: - ENA network exceeds disk throughput - Recovery times are disk-bound - Disk bw / node < d2
  39. 39. Instance Types (AWS) i3: bandwidth monster Good for: - Low MTTRs - Concurrent i/o outside the page cache Problems: - storage/$
  40. 40. Instance Types (AWS) - r4, c5, etc + gp2 EBS It actually works well; EBS perf isn’t a problem
  41. 41. Instance Types (AWS) - r4, c5, etc + gp2 EBS It actually works well; EBS perf isn’t a problem Problems: - low EBS channel bw in relation to instance size - the burden of running a distributed/replicated store, hinging it on tech that solves 2009 problems - may want to consider Kinesis / etc?
  42. 42. Data Placement
  43. 43. Data Placement topicmappr optimizes for: - maximum leadership distribution
  44. 44. Data Placement topicmappr optimizes for: - maximum leadership distribution - replica rack.id isolation
  45. 45. Data Placement topicmappr optimizes for: - maximum leadership distribution - replica rack.id isolation - maximum replica list entropy
  46. 46. Data Placement Maximum replica list entropy(?) “For all partitions that a given broker holds, ensuring that the partition replicas are distributed among as many other unique brokers as possible”
  47. 47. Data Placement Maximum replica list entropy It’s possible to have maximal partition distribution but a low number of unique broker-to-broker relationships
  48. 48. Data Placement Maximum replica list entropy It’s possible to have maximal partition distribution but a low number of unique broker-to-broker relationships Example: broker A holds 20 partitions, all 20 replica sets contain only 3 other brokers
  49. 49. Data Placement Maximum replica list entropy! - topicmappr expresses this as node degree distribution
  50. 50. Data Placement Maximum replica list entropy! - topicmappr expresses this as node degree distribution - broker-to-broker relationships: it’s a graph
  51. 51. Data Placement Maximum replica list entropy! - topicmappr expresses this as node degree distribution - broker-to-broker relationships: it’s a graph - replica sets are partial adjacency lists
  52. 52. Data Placement A graph of replicas Given the following partition replica sets: p0: [1, 2, 3] p1: [ 2, 3, 4]
  53. 53. Data Placement A graph of replicas Given the following partition replica sets: p0: [1, 2, 3] p1: [ 2, 3, 4] Broker 3’s adjacency list -> [1, 2, 4]
  54. 54. Data Placement A graph of replicas Given the following partition replica sets: p0: [1, 2, 3] p1: [ 2, 3, 4] Broker 3’s adjacency list -> [1, 2, 4] (degree = 3)
  55. 55. Data Placement Maximizing replica list entropy (is good)
  56. 56. Data Placement Maximizing replica list entropy (is good) In broker failure/replacements: - probabilistically increases replication sources - faster, lower impact recoveries
  57. 57. Data Placement topicmappr optimizes for: - maximum leadership distribution ✅ - replica rack.id isolation ✅ - maximum replica list entropy ✅
  58. 58. Maintaining Pools
  59. 59. Maintaining Pools Most common tasks: - ensuring storage balance - simple broker replacements
  60. 60. Maintaining Pools Most common tasks: - ensuring storage balance - simple broker replacements Both of these are (also) done with topicmappr
  61. 61. Maintaining Pools Broker storage balance
  62. 62. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free
  63. 63. Maintaining Pools Broker storage balance
  64. 64. Maintaining Pools Broker storage balance
  65. 65. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free - plans relocations to least-utilized brokers
  66. 66. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free - plans relocations to least-utilized brokers - fair-share, first-fit descending bin-packing
  67. 67. Maintaining Pools Broker storage balance - finds offload candidates: n distance below harmonic mean storage free - plans relocations to least-utilized brokers - fair-share, first-fit descending bin-packing - loops until no more relocations can be planned
  68. 68. Maintaining Pools Broker storage balance (results)
  69. 69. Maintaining Pools Broker replacements
  70. 70. Maintaining Pools Broker replacements - When a single broker fails, how is a replacement chosen?
  71. 71. Maintaining Pools Broker replacements - When a single broker fails, how is a replacement chosen? - Goal: retain any previously computed storage balance (via 1:1 replacements)
  72. 72. Maintaining Pools Broker replacements - When a single broker fails, how is a replacement chosen? - Goal: retain any previously computed storage balance (via 1:1 replacements) - Problem: dead brokers no longer visible in ZK
  73. 73. Maintaining Pools Broker replacements - topicmappr can be provided several hot spares from varying AZs (rack.id)
  74. 74. Maintaining Pools Broker replacements - topicmappr can be provided several hot spares from varying AZs (rack.id) - infers a suitable replacement (“substitution affinity” feature)
  75. 75. Maintaining Pools Broker replacements - inferring replacements - traverse all ISRs, build a set of all rack.ids: G = {1a,1b,1c,1d}
  76. 76. Maintaining Pools Broker replacements - inferring replacements - traverse all ISRs, build a set of all rack.ids: G = {1a,1b,1c,1d} - traverse affected ISRs, build a set of live rack.ids: L = {1a,1c}
  77. 77. Maintaining Pools Broker replacements - inferring replacements - Build a set of suitable rack.ids to choose from: S = { x ∈ G | x ∉ L }
  78. 78. Maintaining Pools Broker replacements - inferring replacements - Build a set of suitable rack.ids to choose from: S = { x ∈ G | x ∉ L } - S = {1b,1d} - automatically chooses a hot spare from 1b or 1d
  79. 79. Maintaining Pools Broker replacements - inferring replacements Outcome: - Keeps brokers bound to specific pools - Simple repairs that maintain storage balance, high utilization
  80. 80. Scaling Pools
  81. 81. Scaling Pools When: >90% storage utilization in 48h
  82. 82. Scaling Pools How: - add brokers to pool - run a rebalance - autothrottle takes over
  83. 83. autothrottle is a service that dynamically manages replication rates
  84. 84. Scaling Pools Increasing capacity also improves storage balance
  85. 85. What’s Next
  86. 86. What’s Next - precursor to fully automated capacity management
  87. 87. What’s Next - precursor to fully automated capacity management - continued growth, dozens more clusters
  88. 88. What’s Next - precursor to fully automated capacity management - continued growth, dozens more clusters - new infrastructure
  89. 89. What’s Next - precursor to fully automated capacity management - continued growth, dozens more clusters - new infrastructure - (we’re hiring)
  90. 90. Thank you Jamie Alquiza Sr. Software Engineer twitter.com/jamiealquiza
  • georgeericksonjr

    Mar. 10, 2021
  • kjoel2006

    Jan. 24, 2020
  • StreamingAnalytics

    Sep. 1, 2019

This talk covers Kafka cluster sizing, instance type selections, scaling operations, replication throttling and more. Don’t forget to check out the Kafka-Kit repository. https://www.youtube.com/watch?time_continue=2613&v=7uN-Vlf7W5E

Views

Total views

824

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

16

Shares

0

Comments

0

Likes

3

×