SlideShare a Scribd company logo
1 of 25
Download to read offline
Adi Muraru
DoK Day Europe 2022 @ KubeCon
Running Kafka on Kubernetes,
across three clouds at Adobe
Who I am ?
• Principal Scientist with Adobe Experience Platform
• Been with Adobe for 10yrs
• Big Data lead engineer for Hadoop, HBase and
Kafka infrastructures
• Led the transition of Pipeline Kafka to
Kubernetes/Ethos
DoK Day Europe 2022 @ KubeCon
§ What is Adobe Experience Platform Pipeline?
§ Pipeline “v2” guiding principles
§ Kafka Operator
§ Mechanical Sympathy – Build for resiliency
§ Lessons & considerations
The road to running Adobe Kafka on Kubernetes
Adobe Experience Platform Pipeline
• Kafka Based Message Bus
• Global deployment
• 14 Regions (Private Cloud, AWS, Azure)
• 25 production Clusters
• Scale
• 300+ Billion events
• 250+ TB / day IN
• 800+ TB / day OUT
• 1.7 PB stored
Adobe Pipeline – 10K foot view
Migration to Kubernetes
A uniform automated deployment
of Pipeline across different clouds
Cloud native, provider neutral guiding principles
Elastic,
Flexible &
Scalable
Cost
Effective
Resilient &
DR Ready
Monitoring
& Alerting
as Code
GitOps and
CI/CD
Operator Framework
• Operators are extensions to Kubernetes that make use of custom resources to manage applications
• Operators implement control loop pattern
KOperator
https://operatorframework.io/operator-capabilities/
Capabilities
Criteria KOperator
Automated provisioning via CustomResources
Patch & minor upgrades support
Metrics, log processing, workload analysis
Horizontal/vertical scaling
Auto configuration
Abnormal detection – failed brokers, slow disks
Backup / failure recovery
Architecture
https://github.com/banzaicloud/kafka-operator
Deployment
Mechanical sympathy is
when you use a tool or system
with an understanding
of how it operates best
-- Jackie Stewart, racing driver
Mechanical Sympathy – Build for Resiliency
Lessons & Considerations
• PersistentVolumeClaims and StorageClasses
• Backed by Cloud managed disks (EBS, AzureDisk, Vsphere disks)
• Be mindful of cloud specifics, e.g. Volumes larger than or equal to 334 GiB deliver 250
MiB/s regardless of burst credits
• Immediate and WaitForFirstConsumer volume binding mode
• WFFC recommended - delays the disk binding to Node until Pod is a scheduled on a node
Lessons & Considerations
• Use Availability Zones to map to Kafka “racks”
• Via topology.kubernetes.io/zone nodeSelector
• Use podAntiAffinity to avoid two Kafka brokers pods on the same Node
• Avoid IO contention
• Or consider dedicated worker pools
Lessons & Considerations
• Consider setting requests==limits to ensure pod gets Guaranteed scheduling class
• Implement clean container shutdown hooks, e.g PreStop and
terminationGracePeriodSeconds
Lessons & Considerations
• If your workload exploits page-cache to I/O make sure you account for that in container
requests/limits
• For Kafka we set JVM heap 70%, the rest goes to page-cache
• Account for other pods on K8s Node contending for IO with yours
• K8s does not (yet?) consider Node/VM IO capacity as a kubelet allocatable resource
• Workaround for us was to overprovision the Kafka pod via CPU/MEM to avoid noisy neighbours
• Monitor slow disks
• Instrument and Monitor latencies
• Act on them
Lessons & Considerations
• Be mindful of cluster auto-scaler
• This includes scale in (use cluster-autoscaler.kubernetes.io/safe-to-evict=false annotation to
alleviate)
• Evaluate Ethos ARC (Automatic Resource Configuration)
• Auto-tunes container requested resources
• Optimizer but may conflict with your own capacity management
Lessons & Considerations
• Tolerate Ethos cluster maintenance
• resources updates
• full cluster upgrades (FCUs)
• PodDisruptionBudget
• Kafka operator sets this dynamically based on number of brokers
• Last but least, have an off-cluster backup and disaster recovery strategy
• Mirror Kafka topics to blob storage
• Replicate topics to multiple regions
Lessons & Considerations
• Metrics, metrics, metrics …
and alerts
• Monitors as code
• Alert rules as code
• Dashboards as code
DoK Day North America 2021 @ KubeCon
Thank you!
Adi Muraru
Running Kafka on Kubernetes, across three clouds at Adobe
Backup
DoK
Day
North
America
2021
@
KubeCon
DedicatedVIPs for Kafka Ingress
• DedicatedVIPs to the rescue!
• Access Kafka from outside of
Ethos cluster
• Kafka talks its native/non-http L7
protocol
DedicatedVIPs | AZ aware
Requests are load-balanced within the AZ
only, to avoid cross-AZ bandwidth costs
Avoid cross AZ traffic costs (i.e. hops to
NodePorts in a different AZ than LB and
Broker)
Kafka seeds:
§ kafka-1-az1-or2.prd.pipeline.adobedc.net:9101
§ kafka-1-az2-or2.prd.pipeline.adobedc.net:9201
§ kafka-1-az3-or2.prd.pipeline.adobedc.net:9301
Wish List
• gp3 configurable IOPS/throughput
• Bring Your Own (BYO) encryption keys for disks
• Cost reports to include disks and network
• Handle disk IOPS/throughput as a proper K8s kubelet resource

More Related Content

Similar to Running Kafka on Kubernetes, across three clouds at Adobe

Best Practices with Azure Kubernetes Services
Best Practices with Azure Kubernetes ServicesBest Practices with Azure Kubernetes Services
Best Practices with Azure Kubernetes ServicesQAware GmbH
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes InternalsShimi Bandiel
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)sriram_rajan
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterpriseBert Poller
 
Persistent Storage in Docker Platform
Persistent Storage in Docker PlatformPersistent Storage in Docker Platform
Persistent Storage in Docker PlatformAnusha Ragunathan
 
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...Amazon Web Services
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDocker, Inc.
 
Implementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using KubelessImplementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using KubelessAhmed Misbah
 
SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...
SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...
SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...David vonThenen
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzurePatrick Chanezon
 
SKILup Days Container Orchestration - Kubernetes Operators for Databases
SKILup Days Container Orchestration - Kubernetes Operators for DatabasesSKILup Days Container Orchestration - Kubernetes Operators for Databases
SKILup Days Container Orchestration - Kubernetes Operators for DatabasesJuarez Junior
 
Oracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes ClusterOracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes Clustervasuballa
 
Kubernetes Basics - ICP Workshop Batch II
Kubernetes Basics - ICP Workshop Batch IIKubernetes Basics - ICP Workshop Batch II
Kubernetes Basics - ICP Workshop Batch IIPT Datacomm Diangraha
 
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...HostedbyConfluent
 
How Container Schedulers and Software-based Storage will Change the Cloud
How Container Schedulers and Software-based Storage will Change the CloudHow Container Schedulers and Software-based Storage will Change the Cloud
How Container Schedulers and Software-based Storage will Change the CloudDavid vonThenen
 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introductionJason Hu
 
Storage Integrations for Container Orchestrators
Storage Integrations for Container OrchestratorsStorage Integrations for Container Orchestrators
Storage Integrations for Container Orchestrators{code} by Dell EMC
 

Similar to Running Kafka on Kubernetes, across three clouds at Adobe (20)

eCAP Developer Walkthru
eCAP Developer WalkthrueCAP Developer Walkthru
eCAP Developer Walkthru
 
AKS components
AKS componentsAKS components
AKS components
 
Best Practices with Azure Kubernetes Services
Best Practices with Azure Kubernetes ServicesBest Practices with Azure Kubernetes Services
Best Practices with Azure Kubernetes Services
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Persistent Storage in Docker Platform
Persistent Storage in Docker PlatformPersistent Storage in Docker Platform
Persistent Storage in Docker Platform
 
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
Managing Docker & ECS Based Applications with AWS Elastic Beanstalk - DevDay ...
 
DCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker ContainersDCEU 18: Provisioning and Managing Storage for Docker Containers
DCEU 18: Provisioning and Managing Storage for Docker Containers
 
Implementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using KubelessImplementing FaaS on Kubernetes using Kubeless
Implementing FaaS on Kubernetes using Kubeless
 
SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...
SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...
SCaLE 15x - How Container Schedulers and Software-Defined Storage will Change...
 
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on AzureDevoxx France 2015 - The Docker Orchestration Ecosystem on Azure
Devoxx France 2015 - The Docker Orchestration Ecosystem on Azure
 
SKILup Days Container Orchestration - Kubernetes Operators for Databases
SKILup Days Container Orchestration - Kubernetes Operators for DatabasesSKILup Days Container Orchestration - Kubernetes Operators for Databases
SKILup Days Container Orchestration - Kubernetes Operators for Databases
 
Oracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes ClusterOracle E-Business Suite on Kubernetes Cluster
Oracle E-Business Suite on Kubernetes Cluster
 
Kubernetes Basics - ICP Workshop Batch II
Kubernetes Basics - ICP Workshop Batch IIKubernetes Basics - ICP Workshop Batch II
Kubernetes Basics - ICP Workshop Batch II
 
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
 
How Container Schedulers and Software-based Storage will Change the Cloud
How Container Schedulers and Software-based Storage will Change the CloudHow Container Schedulers and Software-based Storage will Change the Cloud
How Container Schedulers and Software-based Storage will Change the Cloud
 
Docker and kubernetes_introduction
Docker and kubernetes_introductionDocker and kubernetes_introduction
Docker and kubernetes_introduction
 
Storage Integrations for Container Orchestrators
Storage Integrations for Container OrchestratorsStorage Integrations for Container Orchestrators
Storage Integrations for Container Orchestrators
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsDoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryDoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...DoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on KubernetesDoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyDoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudDoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native DatabaseDoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sDoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Running Kafka on Kubernetes, across three clouds at Adobe

  • 1. Adi Muraru DoK Day Europe 2022 @ KubeCon Running Kafka on Kubernetes, across three clouds at Adobe
  • 2. Who I am ? • Principal Scientist with Adobe Experience Platform • Been with Adobe for 10yrs • Big Data lead engineer for Hadoop, HBase and Kafka infrastructures • Led the transition of Pipeline Kafka to Kubernetes/Ethos
  • 3. DoK Day Europe 2022 @ KubeCon § What is Adobe Experience Platform Pipeline? § Pipeline “v2” guiding principles § Kafka Operator § Mechanical Sympathy – Build for resiliency § Lessons & considerations The road to running Adobe Kafka on Kubernetes
  • 4. Adobe Experience Platform Pipeline • Kafka Based Message Bus • Global deployment • 14 Regions (Private Cloud, AWS, Azure) • 25 production Clusters • Scale • 300+ Billion events • 250+ TB / day IN • 800+ TB / day OUT • 1.7 PB stored
  • 5. Adobe Pipeline – 10K foot view
  • 6. Migration to Kubernetes A uniform automated deployment of Pipeline across different clouds
  • 7. Cloud native, provider neutral guiding principles Elastic, Flexible & Scalable Cost Effective Resilient & DR Ready Monitoring & Alerting as Code GitOps and CI/CD
  • 8. Operator Framework • Operators are extensions to Kubernetes that make use of custom resources to manage applications • Operators implement control loop pattern
  • 10. Capabilities Criteria KOperator Automated provisioning via CustomResources Patch & minor upgrades support Metrics, log processing, workload analysis Horizontal/vertical scaling Auto configuration Abnormal detection – failed brokers, slow disks Backup / failure recovery
  • 13. Mechanical sympathy is when you use a tool or system with an understanding of how it operates best -- Jackie Stewart, racing driver Mechanical Sympathy – Build for Resiliency
  • 14. Lessons & Considerations • PersistentVolumeClaims and StorageClasses • Backed by Cloud managed disks (EBS, AzureDisk, Vsphere disks) • Be mindful of cloud specifics, e.g. Volumes larger than or equal to 334 GiB deliver 250 MiB/s regardless of burst credits • Immediate and WaitForFirstConsumer volume binding mode • WFFC recommended - delays the disk binding to Node until Pod is a scheduled on a node
  • 15. Lessons & Considerations • Use Availability Zones to map to Kafka “racks” • Via topology.kubernetes.io/zone nodeSelector • Use podAntiAffinity to avoid two Kafka brokers pods on the same Node • Avoid IO contention • Or consider dedicated worker pools
  • 16. Lessons & Considerations • Consider setting requests==limits to ensure pod gets Guaranteed scheduling class • Implement clean container shutdown hooks, e.g PreStop and terminationGracePeriodSeconds
  • 17. Lessons & Considerations • If your workload exploits page-cache to I/O make sure you account for that in container requests/limits • For Kafka we set JVM heap 70%, the rest goes to page-cache • Account for other pods on K8s Node contending for IO with yours • K8s does not (yet?) consider Node/VM IO capacity as a kubelet allocatable resource • Workaround for us was to overprovision the Kafka pod via CPU/MEM to avoid noisy neighbours • Monitor slow disks • Instrument and Monitor latencies • Act on them
  • 18. Lessons & Considerations • Be mindful of cluster auto-scaler • This includes scale in (use cluster-autoscaler.kubernetes.io/safe-to-evict=false annotation to alleviate) • Evaluate Ethos ARC (Automatic Resource Configuration) • Auto-tunes container requested resources • Optimizer but may conflict with your own capacity management
  • 19. Lessons & Considerations • Tolerate Ethos cluster maintenance • resources updates • full cluster upgrades (FCUs) • PodDisruptionBudget • Kafka operator sets this dynamically based on number of brokers • Last but least, have an off-cluster backup and disaster recovery strategy • Mirror Kafka topics to blob storage • Replicate topics to multiple regions
  • 20. Lessons & Considerations • Metrics, metrics, metrics … and alerts • Monitors as code • Alert rules as code • Dashboards as code
  • 21. DoK Day North America 2021 @ KubeCon Thank you! Adi Muraru Running Kafka on Kubernetes, across three clouds at Adobe
  • 23. DedicatedVIPs for Kafka Ingress • DedicatedVIPs to the rescue! • Access Kafka from outside of Ethos cluster • Kafka talks its native/non-http L7 protocol
  • 24. DedicatedVIPs | AZ aware Requests are load-balanced within the AZ only, to avoid cross-AZ bandwidth costs Avoid cross AZ traffic costs (i.e. hops to NodePorts in a different AZ than LB and Broker) Kafka seeds: § kafka-1-az1-or2.prd.pipeline.adobedc.net:9101 § kafka-1-az2-or2.prd.pipeline.adobedc.net:9201 § kafka-1-az3-or2.prd.pipeline.adobedc.net:9301
  • 25. Wish List • gp3 configurable IOPS/throughput • Bring Your Own (BYO) encryption keys for disks • Cost reports to include disks and network • Handle disk IOPS/throughput as a proper K8s kubelet resource