Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

Vanessa Vuibert
Sta
ff
Production Engineer
Resilient Ka
f
ka: How DNS Tra
ff
ic Management
and Client Wrappers Ensure Availability
@V3_XD
862 14
Scale
Ka
f
ka brokers Ka
f
ka clusters
14M 9
Messages per sec GCP Regions
@V3_XD
• Maintenance
• Incidents
• Regionalize tra
ff
ic
Tra
ff
ic management use cases
Kubernetes (K8s) out of the box
🔓open source
Kafka broker
K8s out of the box
dig +short service.namespace.svc.cluster.local
IP0
IP1
IP2
K8s out of the box
bootstrap.servers=
service.namespace.svc.cluster.local:9092
K8s out of the box
dig +short pod2.service.namespace.svc.cluster.local
IP2
K8s out of the box
advertised.listeners=
pod2.service.namespace.svc.cluster.local:9092
• Readiness
• Startup
• Liveness
K8s StatefulSet: probes
dig +short service.namespace.svc.cluster.local
IP0
IP2
K8s readiness probe
dig +short service.namespace.svc.cluster.local
IP0
IP2
IP3
K8s readiness probe
not ready
publishNotReadyAddresses: true
Regional pairs
External tra
ff
ic: load balancers
External tra
ff
ic: load balancers
bootstrap.servers
External tra
ff
ic: load balancers
advertised.listeners
• Issues scaling
• Manual broker DNS
records
• Limited tra
ff
ic
control
Built automation with
k8s controllers.
Stateful buddy: load balancers
🔒closed source
Name buddy: DNS records
🔒closed source
Ka
f
ka access buddy: endpoints
🔒closed source
Ka
f
ka Access Buddy: consumer
Ka
f
ka Access Buddy: producer failover
east
- Elasticsearch on call
“Let me failover real quick.”
Faster failovers with a
DNS tra
ff
ic manager.
DNS tra
ff
ic manager
🔒closed source
DNS tra
ff
ic manager: normal
dig +short us-east1.somedomain.com
US-East1-IP
DNS tra
ff
ic manager: failover
dig +short us-east1.somedomain.com
US-Central1-IP
- A Ka
f
ka client
“DNS trickery.”
used to take
40
Minutes
now only takes
1
Minutes
Failover time savings
@V3_XD
Incident during
fl
ashsale
Failover during
fl
ashsale
US Central1 -> US East1
Reduced toil with
client wrappers.
• Failover reconnection
• Everything needed for connection
• Ruby, go and python
Client wrappers
K8s Deployment template: bootstrap.servers
K8s Deployment template: client ID
K8s Deployment template
Improved availability
with local consumers.
• More availability
• Reduced latency
• Reduced storage costs
• Reduced network costs
Local consumers
Aggregate consumer
Local consumers
Local consumers: DNS records
Aggregate
500
ms
Regional
20
ms
Latency 99th
@V3_XD
Connect directly
through private IPs.
• More secure
• Reduced network costs
• Fetch from closest replica: KIP
-
392
Public to private tra
ff
ic
Tra
ff
ic manager: pod IPs
Reduction
-6%
bill
Network represents
29%
bill
Network cost reduction
@V3_XD
• GKE 1.24 -> 1.25
incident
• Apply
f
irewall rules
• LB more secure for
public tra
ff
ic
Failover: pod IPs
Single stop shop with Multi-
Cluster Services (MCS).
MCS endpoints
🔒closed source
Tra
ff
ic sources
Regional pairs: uneven distribution
Regionalize tra
ff
ic: Ka
f
ka access buddy
east
Regionalize tra
ff
ic: MCS
40 18
MCS time savings
Minutes to regionalize tra
ff
ic Minutes to deploy
1 13
Minutes after migration Minutes after migration
@V3_XD
Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability
• Resiliency: DNS
tra
ff
ic management
• Toil: client wrappers
• Availability: local
consumption
Thanks!
@V3_XD
1 of 58

Recommended

Keystone - ApacheCon 2016 by
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
301 views75 slides
Capital One Delivers Risk Insights in Real Time with Stream Processing by
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
1.6K views53 slides
From Three Nines to Five Nines - A Kafka Journey by
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
1.4K views39 slides
Accelerated SDN in Azure by
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in AzureOpen Networking Summit
712 views25 slides
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic... by
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
282 views23 slides
Cloud Native SDN by
Cloud Native SDNCloud Native SDN
Cloud Native SDNRomana Project
1.9K views17 slides

More Related Content

Similar to Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

Uber Real Time Data Analytics by
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
2.4K views71 slides
In Flux Limiting for a multi-tenant logging service by
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceDataWorks Summit/Hadoop Summit
1.4K views15 slides
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015 by
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
1.2K views96 slides
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022 by
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
749 views27 slides
DNS Survival Guide. by
DNS Survival Guide.DNS Survival Guide.
DNS Survival Guide.Qrator Labs
102 views53 slides
DNS Survival Guide by
DNS Survival GuideDNS Survival Guide
DNS Survival GuideAPNIC
403 views53 slides

Similar to Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability(20)

Uber Real Time Data Analytics by Ankur Bansal
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal2.4K views
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015 by Monal Daxini
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini1.2K views
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022 by HostedbyConfluent
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
HostedbyConfluent749 views
DNS Survival Guide. by Qrator Labs
DNS Survival Guide.DNS Survival Guide.
DNS Survival Guide.
Qrator Labs102 views
DNS Survival Guide by APNIC
DNS Survival GuideDNS Survival Guide
DNS Survival Guide
APNIC403 views
Experience with Kafka & Storm by Otto Mok
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
Otto Mok4.9K views
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris... by Natan Silnitsky
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...
Natan Silnitsky143 views
Battle Tested Event-Driven Patterns for your Microservices Architecture by Natan Silnitsky
Battle Tested Event-Driven Patterns for your Microservices ArchitectureBattle Tested Event-Driven Patterns for your Microservices Architecture
Battle Tested Event-Driven Patterns for your Microservices Architecture
Natan Silnitsky170 views
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual... by Amazon Web Services
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn... by HostedbyConfluent
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent1.4K views
Summit 16: Achieving Low Latency Network Function with Opnfv by OPNFV
Summit 16: Achieving Low Latency Network Function with OpnfvSummit 16: Achieving Low Latency Network Function with Opnfv
Summit 16: Achieving Low Latency Network Function with Opnfv
OPNFV816 views
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate by PROIDEA
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGatePLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate
PROIDEA243 views
Integrating OpenStack To Existing Infrastructure by Hui Cheng
Integrating OpenStack To Existing InfrastructureIntegrating OpenStack To Existing Infrastructure
Integrating OpenStack To Existing Infrastructure
Hui Cheng3.7K views
(BDT318) How Netflix Handles Up To 8 Million Events Per Second by Amazon Web Services
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services79.1K views
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc... by Docker, Inc.
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker, Inc.2.7K views
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning by Guido Schmutz
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz1.6K views
Practice of large Hadoop cluster in China Mobile by DataWorks Summit
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit785 views
ddsf-student-presentation_756205.pptx by ssuser498be2
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptx
ssuser498be22 views
FreeSWITCH as a Microservice by Evan McGee
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
Evan McGee3.4K views

Recently uploaded

Global airborne satcom market report by
Global airborne satcom market reportGlobal airborne satcom market report
Global airborne satcom market reportdefencereport78
6 views13 slides
DESIGN OF SPRINGS-UNIT4.pptx by
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptxgopinathcreddy
21 views47 slides
MongoDB.pdf by
MongoDB.pdfMongoDB.pdf
MongoDB.pdfArthyR3
49 views6 slides
Ansari: Practical experiences with an LLM-based Islamic Assistant by
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic AssistantM Waleed Kadous
9 views29 slides
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for GrowthInnomantra
15 views4 slides
Design_Discover_Develop_Campaign.pptx by
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptxShivanshSeth6
49 views20 slides

Recently uploaded(20)

MongoDB.pdf by ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR349 views
Ansari: Practical experiences with an LLM-based Islamic Assistant by M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous9 views
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth by Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 15 views
Design_Discover_Develop_Campaign.pptx by ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth649 views
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... by csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn8 views
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf by AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure8 views
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx by lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78180 views
Créativité dans le design mécanique à l’aide de l’optimisation topologique by LIEGE CREATIVE
Créativité dans le design mécanique à l’aide de l’optimisation topologiqueCréativité dans le design mécanique à l’aide de l’optimisation topologique
Créativité dans le design mécanique à l’aide de l’optimisation topologique
LIEGE CREATIVE8 views
REACTJS.pdf by ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR337 views
Design of machine elements-UNIT 3.pptx by gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy37 views
Searching in Data Structure by raghavbirla63
Searching in Data StructureSearching in Data Structure
Searching in Data Structure
raghavbirla6317 views

Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability