SlideShare a Scribd company logo
Tales from the
four-comma club
Managing Kafka as a Service at
Salesforce
Paul Davidson, Software Architect
Lei Ye, Principal Software Engineer
Paul Davidson
Software Engineering Architect Logo
Lei Ye
Principal Software Engineer
What we will talk about
Operating Kafka at Global Scale
Many brokers, Many clusters, Many workloads, Many substrates, Many tenants
Connectivity
Global Data Pipelines
Testing
Automating for Availability
Salesforce Kafka as a Service
Built on open-source Apache Kafka
Managed Solution for
Pub/Sub and Data Pipelines at Salesforce
Salesforce Kafka
Operational and Customer-Facing
Pub / Sub
Including multi-tenant (100s of thousands)
Salesforce Platform Events
Change Data Capture
Capture changes to Salesforce Data Records as events
Use Cases
Salesforce Kafka
Stream Processing
Spark, Flink, etc.
Cross-site Data Pipelines
Storage and Aggregation (e.g. Logs, Metrics, Traces, other)
Use Cases
Salesforce Kafka
100+ Kafka clusters
Kafka 2.6
50,000+ topics, 300K+ partitions
2500+ brokers, 40+ PB storage
Largest clusters 250+ brokers
3+ Trillion Events/Day (average)
Doubling each year
Current Scale
Management and Automation Tooling
● Investing in orchestration and self-healing
○ Kubernetes in public cloud
○ Choria for on-prem deployments and self-healing
○ Continuous automated patching
○ Further growth requires more automation!
● Tooling
○ Cruise Control, Xinfra Monitor, Argus (internal OpenTSDB based service), Splunk
○ Topic Management and visibility: in-house tools (Wrangler - cross-cluster UI)
Kafka API Gateway
● Native Kafka Endpoint for cross-site connectivity
● Standard load balancer + Envoy as a Layer 4
SNI-routing Proxy
● Envoy routes TLS connection to correct broker
using SNI (Server Name Indication)
○ SNI = TLS Extension supported by Kafka Clients
● Allows secure cross-site connectivity with
standard Kafka Clients
Data Pipelines
● Mirus: Reliable, high performance replication tool
○ Mirus Kafka Connector
○ Open Source: https://github.com/salesforce/mirus
○ Low-latency replication
● Object Store Connector
○ Support S3, GCS, etc.
○ High-throughput replication
● Provides global cross-site data transport
Replicating Events between Data Centers
Kafka Operation Experience
Case Study
● Multiple dedicated regional Kafka clusters on Public Cloud to stream security log data to
Defensive Security data lakes
● Requirements
○ High availability
○ Durability
○ Scalability
○ Low operation cost
● We learnt a lot of Kafka operation experience
○ Offline/Under-replicated partition
○ Frozen broker
○ Disk failure
○ Patching noise
● We built useful tools to improve Kafka testing and operation
○ Kafka fault injection testing infrastructure
○ Kafka auto-fix ops tool to enable automated partition and broker recovery without data loss
Kafka Testing Infrastructure
● Reliable and easy-to-configure to run multiple types of testings
○ functional/performance/load/scale/upgrade-and-downgrade/patching tests
● Deterministic, reproducible infrastructure to repeat all types of testing
● Precise fault-injection capability to trigger fault events to simulate production incident
● Orchestrating all components to provide a high quality of QA for Kafka
Kafka Testing Infrastructure Architecture
● Test Spec -- a JSON format test file
● Test Coordinator -- the control plane of
testing infrastructure. It’s a web service
parsing the test spec and submitting a
test to the Kafka cluster
● Load Generator -- simulates producer and
consumer traffic over Kafka topics
● Ops Tool -- auxiliary tools to manage and
configure the Kafka cluster
● Fault Injection Framework -- based on
Trogdor and Kibosh to simulate the
network/process/filesystem faults
● Result Analysis -- tracks log and metrics
from Kafka cluster to analyze testing
results
Test Spec Format
Running Test Spec
Default Kafka Cluster Configuration: Min-ISR = 2; Produce Ack = ALL; Unclean leader election = false;
Inject network fault [7], [1, 3, 5]
00:00 00:05
Generate producing and consuming load to test1_large
Inject network fault [7], [2, 4, 6]
Inject disk fault on the partition storage of [7]
00:10 00:15 00:20
00:00 Broker 7 starts with a healthy state:
Topic: test1_large Partition: 0 Leader: 7 Replicas: 7,1,2 Isr: 7,1,2
Topic: test1_large Partition: 1 Leader: 7 Replicas: 7,3,4 Isr: 7,4,3
Topic: test1_large Partition: 2 Leader: 7 Replicas: 7,3,4 Isr: 7,4,3
Topic: test1_large Partition: 3 Leader: 7 Replicas: 7,5,6 Isr: 7,6,5
Topic: test1_large Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 7,6,5
00:05 Broker 7 starts degrading, broker 1,3,5 fall out of ISR:
Topic: test1_large Partition: 0 Leader: 7 Replicas: 7,1,2 Isr: 7,2
Topic: test1_large Partition: 1 Leader: 7 Replicas: 7,3,4 Isr: 7,4
Topic: test1_large Partition: 2 Leader: 7 Replicas: 7,3,4 Isr: 7,4
Topic: test1_large Partition: 3 Leader: 7 Replicas: 7,5,6 Isr: 7,6
Topic: test1_large Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 7,6
00:10 Broker 7 continues to degrade, broker 2,4,6 fall out of ISR:
Topic: test1_large Partition: 0 Leader: 7 Replicas: 7,1,2 Isr: 7
Topic: test1_large Partition: 1 Leader: 7 Replicas: 7,3,4 Isr: 7
Topic: test1_large Partition: 2 Leader: 7 Replicas: 7,3,4 Isr: 7
Topic: test1_large Partition: 3 Leader: 7 Replicas: 7,5,6 Isr: 7
Topic: test1_large Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 7
00:15 Broker 7 fails due to disk failure, all five partitions go offline:
Topic: test1_large Partition: 0 Leader: -1 Replicas: 7,1,2 Isr: 7
Topic: test1_large Partition: 1 Leader: -1 Replicas: 7,3,4 Isr: 7
Topic: test1_large Partition: 2 Leader: -1 Replicas: 7,3,4 Isr: 7
Topic: test1_large Partition: 3 Leader: -1 Replicas: 7,5,6 Isr: 7
Topic: test1_large Partition: 4 Leader: -1 Replicas: 7,5,6 Isr: 7
How to Recover?
● For offline partitions, it should perform the following steps
○ Identify the most advanced replica that has the highest LogEndOffset using jmxtool
./bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi --object-name
kafka.log:type=Log,name=LogEndOffset,topic=test1_large,partition=0 --one-time true
○ Elect it as a new leader by using partition reassignment
■ If new leader is the first in partition replica assignment, it can skip reassignment step
■ Execute Kafka partition reassignment tool
./bin/kafka-reassign-partitions.sh --reassignment-json-file partitions-to-move.json --execute
■ Make sure the new leader is the first live replica in the partition replica assignment
○ Enable the unclean leader election for the topic
○ Disable the unclean leader election for the topic after recovery
● For bad brokers, it needs to decommission the hosts, re-provision and add the new hosts
back to the fleet
Kafka Auto-fix Ops Tool
● One-stop tool to automate the entire partition and broker recovery process
● Short MTTR (~2 mins)
● No data loss
Conclusion
Summary
Salesforce relies on Kafka at global scale
Global data transport
Widely varied workloads
Massive multi-tenancy
Availability at Scale requires investing in
Automated Testing
Automated Operations and Self-healing
Visibility, Monitoring, and Alerting
More growth requires more automation!
Thank You

More Related Content

What's hot

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
confluent
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
HostedbyConfluent
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
HostedbyConfluent
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
HostedbyConfluent
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
confluent
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
HostedbyConfluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
HostedbyConfluent
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
confluent
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
HostedbyConfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
HostedbyConfluent
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
HostedbyConfluent
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
Akash Vacher
 

What's hot (20)

Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
Give Your Confluent Platform Superpowers! (Sandeep Togrika, Intel and Bert Ha...
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
Securing the Message Bus with Kafka Streams | Paul Otto and Ryan Salcido, Raf...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
Should you read Kafka as a stream or in batch? Should you even care? | Ido Na...
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache KafkaCommon Patterns of Multi Data-Center Architectures with Apache Kafka
Common Patterns of Multi Data-Center Architectures with Apache Kafka
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 

Similar to Tales from the four-comma club: Managing Kafka as a service at Salesforce | Lei Ye and Paul Davidson, Salesforce

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
NetApp
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
Mark Swarbrick
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
nadirpervez2
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
LarryZaman
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Databricks
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Samuel Kerrien
 
Managing Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware
Managing Oracle Enterprise Manager Cloud Control 12c with Oracle ClusterwareManaging Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware
Managing Oracle Enterprise Manager Cloud Control 12c with Oracle ClusterwareLeighton Nelson
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
aragozin
 
Ceph
CephCeph
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Odinot Stanislas
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Distributed tracing in OpenStack
Distributed tracing in OpenStackDistributed tracing in OpenStack
Distributed tracing in OpenStack
Ilya Shakhat
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
YUCHENG HU
 
Tips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaTips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache Kafka
All Things Open
 
Resume_CQ_Edward
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edwardcaiqi wang
 

Similar to Tales from the four-comma club: Managing Kafka as a service at Salesforce | Lei Ye and Paul Davidson, Salesforce (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
 
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisationMySQL Webinar 2/4 Performance tuning, hardware, optimisation
MySQL Webinar 2/4 Performance tuning, hardware, optimisation
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
11g R2
11g R211g R2
11g R2
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Managing Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware
Managing Oracle Enterprise Manager Cloud Control 12c with Oracle ClusterwareManaging Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware
Managing Oracle Enterprise Manager Cloud Control 12c with Oracle Clusterware
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Distributed tracing in OpenStack
Distributed tracing in OpenStackDistributed tracing in OpenStack
Distributed tracing in OpenStack
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
 
Tips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaTips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache Kafka
 
Resume_CQ_Edward
Resume_CQ_EdwardResume_CQ_Edward
Resume_CQ_Edward
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 

Recently uploaded (20)

Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 

Tales from the four-comma club: Managing Kafka as a service at Salesforce | Lei Ye and Paul Davidson, Salesforce

  • 1. Tales from the four-comma club Managing Kafka as a Service at Salesforce Paul Davidson, Software Architect Lei Ye, Principal Software Engineer
  • 4. What we will talk about Operating Kafka at Global Scale Many brokers, Many clusters, Many workloads, Many substrates, Many tenants Connectivity Global Data Pipelines Testing Automating for Availability
  • 5. Salesforce Kafka as a Service Built on open-source Apache Kafka Managed Solution for Pub/Sub and Data Pipelines at Salesforce
  • 6. Salesforce Kafka Operational and Customer-Facing Pub / Sub Including multi-tenant (100s of thousands) Salesforce Platform Events Change Data Capture Capture changes to Salesforce Data Records as events Use Cases
  • 7. Salesforce Kafka Stream Processing Spark, Flink, etc. Cross-site Data Pipelines Storage and Aggregation (e.g. Logs, Metrics, Traces, other) Use Cases
  • 8. Salesforce Kafka 100+ Kafka clusters Kafka 2.6 50,000+ topics, 300K+ partitions 2500+ brokers, 40+ PB storage Largest clusters 250+ brokers 3+ Trillion Events/Day (average) Doubling each year Current Scale
  • 9. Management and Automation Tooling ● Investing in orchestration and self-healing ○ Kubernetes in public cloud ○ Choria for on-prem deployments and self-healing ○ Continuous automated patching ○ Further growth requires more automation! ● Tooling ○ Cruise Control, Xinfra Monitor, Argus (internal OpenTSDB based service), Splunk ○ Topic Management and visibility: in-house tools (Wrangler - cross-cluster UI)
  • 10.
  • 11. Kafka API Gateway ● Native Kafka Endpoint for cross-site connectivity ● Standard load balancer + Envoy as a Layer 4 SNI-routing Proxy ● Envoy routes TLS connection to correct broker using SNI (Server Name Indication) ○ SNI = TLS Extension supported by Kafka Clients ● Allows secure cross-site connectivity with standard Kafka Clients
  • 12. Data Pipelines ● Mirus: Reliable, high performance replication tool ○ Mirus Kafka Connector ○ Open Source: https://github.com/salesforce/mirus ○ Low-latency replication ● Object Store Connector ○ Support S3, GCS, etc. ○ High-throughput replication ● Provides global cross-site data transport Replicating Events between Data Centers
  • 14. Case Study ● Multiple dedicated regional Kafka clusters on Public Cloud to stream security log data to Defensive Security data lakes ● Requirements ○ High availability ○ Durability ○ Scalability ○ Low operation cost ● We learnt a lot of Kafka operation experience ○ Offline/Under-replicated partition ○ Frozen broker ○ Disk failure ○ Patching noise ● We built useful tools to improve Kafka testing and operation ○ Kafka fault injection testing infrastructure ○ Kafka auto-fix ops tool to enable automated partition and broker recovery without data loss
  • 15. Kafka Testing Infrastructure ● Reliable and easy-to-configure to run multiple types of testings ○ functional/performance/load/scale/upgrade-and-downgrade/patching tests ● Deterministic, reproducible infrastructure to repeat all types of testing ● Precise fault-injection capability to trigger fault events to simulate production incident ● Orchestrating all components to provide a high quality of QA for Kafka
  • 16. Kafka Testing Infrastructure Architecture ● Test Spec -- a JSON format test file ● Test Coordinator -- the control plane of testing infrastructure. It’s a web service parsing the test spec and submitting a test to the Kafka cluster ● Load Generator -- simulates producer and consumer traffic over Kafka topics ● Ops Tool -- auxiliary tools to manage and configure the Kafka cluster ● Fault Injection Framework -- based on Trogdor and Kibosh to simulate the network/process/filesystem faults ● Result Analysis -- tracks log and metrics from Kafka cluster to analyze testing results
  • 18. Running Test Spec Default Kafka Cluster Configuration: Min-ISR = 2; Produce Ack = ALL; Unclean leader election = false; Inject network fault [7], [1, 3, 5] 00:00 00:05 Generate producing and consuming load to test1_large Inject network fault [7], [2, 4, 6] Inject disk fault on the partition storage of [7] 00:10 00:15 00:20 00:00 Broker 7 starts with a healthy state: Topic: test1_large Partition: 0 Leader: 7 Replicas: 7,1,2 Isr: 7,1,2 Topic: test1_large Partition: 1 Leader: 7 Replicas: 7,3,4 Isr: 7,4,3 Topic: test1_large Partition: 2 Leader: 7 Replicas: 7,3,4 Isr: 7,4,3 Topic: test1_large Partition: 3 Leader: 7 Replicas: 7,5,6 Isr: 7,6,5 Topic: test1_large Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 7,6,5 00:05 Broker 7 starts degrading, broker 1,3,5 fall out of ISR: Topic: test1_large Partition: 0 Leader: 7 Replicas: 7,1,2 Isr: 7,2 Topic: test1_large Partition: 1 Leader: 7 Replicas: 7,3,4 Isr: 7,4 Topic: test1_large Partition: 2 Leader: 7 Replicas: 7,3,4 Isr: 7,4 Topic: test1_large Partition: 3 Leader: 7 Replicas: 7,5,6 Isr: 7,6 Topic: test1_large Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 7,6 00:10 Broker 7 continues to degrade, broker 2,4,6 fall out of ISR: Topic: test1_large Partition: 0 Leader: 7 Replicas: 7,1,2 Isr: 7 Topic: test1_large Partition: 1 Leader: 7 Replicas: 7,3,4 Isr: 7 Topic: test1_large Partition: 2 Leader: 7 Replicas: 7,3,4 Isr: 7 Topic: test1_large Partition: 3 Leader: 7 Replicas: 7,5,6 Isr: 7 Topic: test1_large Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 7 00:15 Broker 7 fails due to disk failure, all five partitions go offline: Topic: test1_large Partition: 0 Leader: -1 Replicas: 7,1,2 Isr: 7 Topic: test1_large Partition: 1 Leader: -1 Replicas: 7,3,4 Isr: 7 Topic: test1_large Partition: 2 Leader: -1 Replicas: 7,3,4 Isr: 7 Topic: test1_large Partition: 3 Leader: -1 Replicas: 7,5,6 Isr: 7 Topic: test1_large Partition: 4 Leader: -1 Replicas: 7,5,6 Isr: 7
  • 19. How to Recover? ● For offline partitions, it should perform the following steps ○ Identify the most advanced replica that has the highest LogEndOffset using jmxtool ./bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:9010/jmxrmi --object-name kafka.log:type=Log,name=LogEndOffset,topic=test1_large,partition=0 --one-time true ○ Elect it as a new leader by using partition reassignment ■ If new leader is the first in partition replica assignment, it can skip reassignment step ■ Execute Kafka partition reassignment tool ./bin/kafka-reassign-partitions.sh --reassignment-json-file partitions-to-move.json --execute ■ Make sure the new leader is the first live replica in the partition replica assignment ○ Enable the unclean leader election for the topic ○ Disable the unclean leader election for the topic after recovery ● For bad brokers, it needs to decommission the hosts, re-provision and add the new hosts back to the fleet
  • 20. Kafka Auto-fix Ops Tool ● One-stop tool to automate the entire partition and broker recovery process ● Short MTTR (~2 mins) ● No data loss
  • 22. Summary Salesforce relies on Kafka at global scale Global data transport Widely varied workloads Massive multi-tenancy Availability at Scale requires investing in Automated Testing Automated Operations and Self-healing Visibility, Monitoring, and Alerting More growth requires more automation!