SlideShare a Scribd company logo
1 of 45
1© Cloudera, Inc. All rights reserved.
Kafka Reliability Guarantees
2© Cloudera, Inc. All rights reserved.
But First…What’s NEW???
• Released 0.9.0 in late November
• 87 Contributors, 523 JIRAs, Bunch o’ new Features.
• Security!
• Kerberos/SASL Authentication
• Authorization Plugin
• SSL
• Kafka Connect
• Quotas
• New Consumer****
3© Cloudera, Inc. All rights reserved.
Kafka
• High Throughput
• Low Latency
• Scalable
• Centralized
• Real-time
4© Cloudera, Inc. All rights reserved.
“If data is the lifeblood of high technology, Apache
Kafka is the circulatory system”
--Todd Palino
Kafka SRE @ LinkedIn
5© Cloudera, Inc. All rights reserved.
If Kafka is a critical piece of our pipeline
 Can we be 100% sure that our data will get there?
 Can we lose messages?
 How do we verify?
 Who’s fault is it?
6© Cloudera, Inc. All rights reserved.
Distributed Systems
 Things Fail
 Systems are designed to
tolerate failure
 We must expect failures
and design our code and
configure our systems to
handle them
7© Cloudera, Inc. All rights reserved.
Network
Broker MachineClient Machine
Data Flow
Kafka Client
Broker
O/S Socket Buffer
NIC
NIC
Page Cache
Disk
Application Thread
O/S Socket Buffer
async
callback
✗
✗
✗
✗
✗
✗
✗✗ data
ack / exception
8© Cloudera, Inc. All rights reserved.
Client Machine
Kafka Client
O/S Socket Buffer
NIC
Application Thread
✗
✗
✗Broker Machine
Broker
NIC
Page Cache
Disk
O/S Socket Buffer
miss
✗
✗
✗
✗
Network
Data Flow
✗
data
offsets
ZK
Kafka
✗
9© Cloudera, Inc. All rights reserved.
Replication is your friend
 Kafka protects against failures by replicating data
 The unit of replication is the partition
 One replica is designated as the Leader
 Follower replicas fetch data from the leader
 The leader holds the list of “in-sync” replicas
10© Cloudera, Inc. All rights reserved.
Replication and ISRs
0
1
2
0
1
2
0
1
2
Producer
Broker
100
Broker
101
Broker
102
Topic:
Partitions
:
Replicas:
my_topic
3
3
Partition
:
Leader:
ISR:
1
101
100,102
Partition
:
Leader:
ISR:
2
102
101,100
Partition
:
Leader:
ISR:
0
100
101,102
11© Cloudera, Inc. All rights reserved.
ISR
• 2 things make a replica in-sync
• Lag behind leader
• replica.lag.time.max.ms – replica that didn’t fetch or is behind
• replica.lag.max.messages – will go away in 0.9
• Connection to Zookeeper
12© Cloudera, Inc. All rights reserved.
Terminology
• Acked
• Producers will not retry sending.
• Depends on producer setting
• Committed
• Consumers can read.
• Only when message got to all ISR.
• replica.lag.time.max.ms
• how long can a dead replica prevent
consumers from reading?
13© Cloudera, Inc. All rights reserved.
Replication
• Acks = all
• only waits for in-sync replicas to reply.
Replica 3
100
Replica 2
100
Replica 1
100
Time
14© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• Replica 3 stopped replicating for some reason
Acked in acks = all
“committed”
Acked in acks = 1
but not
“committed”
15© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• One replica drops out of ISR, or goes offline
• All messages are now acked and committed
16© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
• 2nd Replica drops out, or is offline
17© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
• Now we’re in trouble
✗
18© Cloudera, Inc. All rights reserved.
Replication
• If Replica 2 or 3 come back online before the leader, you can will lose data.
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
All those are
“acked” and
“committed”
19© Cloudera, Inc. All rights reserved.
So what to do
• Disable Unclean Leader Election
• unclean.leader.election.enable = false
• Set replication factor
• default.replication.factor = 3
• Set minimum ISRs
• min.insync.replicas = 2
20© Cloudera, Inc. All rights reserved.
Warning
• min.insync.replicas is applied at the topic-level.
• Must alter the topic configuration manually if created before the server level
change
• Must manually alter the topic < 0.9.0 (KAFKA-2114)
21© Cloudera, Inc. All rights reserved.
Replication
• Replication = 3
• Min ISR = 2
Replica 3
100
Replica 2
100
Replica 1
100
Time
22© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• One replica drops out of ISR, or goes offline
23© Cloudera, Inc. All rights reserved.
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101102
103
104
Time
• 2nd Replica fails out, or is out of sync
Buffers
in
Produce
r
24© Cloudera, Inc. All rights reserved.
25© Cloudera, Inc. All rights reserved.
Producer Internals
• Producer sends batches of messages to a buffer
M3
Application
Thread
Application
Thread
Application
Thread
send()
M2 M1 M0
Batch 3
Batch 2
Batch 1
Fail
? response
retry
Update Future
callback
drain
Metadata or
Exception
26© Cloudera, Inc. All rights reserved.
Basics
• Durability can be configured with the producer configuration
request.required.acks
• 0 The message is written to the network (buffer)
• 1 The message is written to the leader
• all The producer gets an ack after all ISRs receive the data; the message
is committed
• Make sure producer doesn’t just throws messages away!
• block.on.buffer.full = true
27© Cloudera, Inc. All rights reserved.
“New” Producer
• All calls are non-blocking async
• 2 Options for checking for failures:
• Immediately block for response: send().get()
• Do followup work in Callback, close producer after error threshold
• Be careful about buffering these failures. Future work? KAFKA-1955
• Don’t forget to close the producer! producer.close() will block until in-
flight txns complete
• retries (producer config) defaults to 0
• message.send.max.retries (server config) defaults to 3
• In flight requests could lead to message re-ordering
28© Cloudera, Inc. All rights reserved.
29© Cloudera, Inc. All rights reserved.
Consumer
• Three choices for Consumer API
• Simple Consumer
• High Level Consumer
• “New Consumer”
30© Cloudera, Inc. All rights reserved.
New Consumer
• Available in Kafka 0.9.0
• Provides better control over offset management
• Enhanced server-side group management
31© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer Group
Consumer1 Consumer2 Consumer
3
Consumer
4
32© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer Group
Consumer
1
Consumer
2
Consumer 3 Consumer
4
Commit?
33© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer Group
Consumer
1
Consumer 2 Consumer
3
Consumer
4
Commit?
34© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Consumer
1
Consumer
2
Consumer
3
Consumer
4
✗Commit
35© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
✗
36© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Consumer
Picks up here
37© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Commit
38© Cloudera, Inc. All rights reserved.
Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Commit
Offset
commits for
all threads
39© Cloudera, Inc. All rights reserved.
P0 P2 P3 P4 P5 P6
Consumer 1 Consumer
2
Consumer
3
Consumer 4
Consumer Offsets
Commit
40© Cloudera, Inc. All rights reserved.
Consumer Recommendations
• Set autocommit.enable = false
• Manually commit offsets after the message data is processed / persisted
consumer.commitOffsets();
• Run each consumer in it’s own thread
41© Cloudera, Inc. All rights reserved.
New Consumer!
• No Zookeeper! At all!
• Rebalance listener
• Commit:
• Commit
• Commit async
• Commit( offset)
• Seek(offset)
42© Cloudera, Inc. All rights reserved.
Exactly Once Semantics
• At most once is easy
• At least once is not bad either – commit after 100% sure data is safe
• Exactly once is tricky
• Commit data and offsets in one transaction
• Idempotent producer
43© Cloudera, Inc. All rights reserved.
Monitoring for Data Loss
• Monitor for producer errors – watch the retry numbers
• Monitor consumer lag – MaxLag or via offsets
• Standard schema:
• Each message should contain timestamp and originating service and host
• Each producer can report message counts and offsets to a special topic
• “Monitoring consumer” reports message counts to another special topic
• “Important consumers” also report message counts
• Reconcile the results
44© Cloudera, Inc. All rights reserved.
Be Safe, Not Sorry
• Acks = all
• Block.on.buffer.full = true
• Retries = MAX_INT
• ( Max.inflight.requests.per.connect = 1 )
• Producer.close()
• Replication-factor >= 3
• Min.insync.replicas = 2
• Unclean.leader.election = false
• Auto.offset.commit = false
• Commit after processing
• Monitor!
45© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafkaJiangjie Qin
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to KafkaDucas Francis
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Patternconfluent
 

What's hot (20)

Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
intro-kafka
intro-kafkaintro-kafka
intro-kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

Viewers also liked

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTodd Palino
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataCloudera, Inc.
 
IBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: Using Publish/Subscribe in an MQ NetworkIBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: Using Publish/Subscribe in an MQ NetworkDavid Ware
 
Effective Application Development with WebSphere Message Broker
Effective Application Development with WebSphere Message BrokerEffective Application Development with WebSphere Message Broker
Effective Application Development with WebSphere Message BrokerAnt Phillips
 
Advanced Pattern Authoring with WebSphere Message Broker
Advanced Pattern Authoring with WebSphere Message BrokerAdvanced Pattern Authoring with WebSphere Message Broker
Advanced Pattern Authoring with WebSphere Message BrokerAnt Phillips
 
Introduction to Patterns in WebSphere Message Broker
Introduction to Patterns in WebSphere Message BrokerIntroduction to Patterns in WebSphere Message Broker
Introduction to Patterns in WebSphere Message BrokerAnt Phillips
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016Leif Davidsen
 
InterConnect 2016: What's new in IBM MQ
InterConnect 2016: What's new in IBM MQInterConnect 2016: What's new in IBM MQ
InterConnect 2016: What's new in IBM MQDavid Ware
 
InterConnect 2016: IBM MQ self-service and as-a-service
InterConnect 2016: IBM MQ self-service and as-a-serviceInterConnect 2016: IBM MQ self-service and as-a-service
InterConnect 2016: IBM MQ self-service and as-a-serviceDavid Ware
 
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersIBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersDavid Ware
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...Leif Davidsen
 
Docker Overview - Rise of the Containers
Docker Overview - Rise of the ContainersDocker Overview - Rise of the Containers
Docker Overview - Rise of the ContainersRyan Hodgin
 
Data Power Architectural Patterns - Jagadish Vemugunta
Data Power Architectural Patterns - Jagadish VemuguntaData Power Architectural Patterns - Jagadish Vemugunta
Data Power Architectural Patterns - Jagadish Vemuguntafloridawusergroup
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster RecoveryMarkTaylorIBM
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overviewRyan Hodgin
 
IBM Integration Bus High Availability Overview
IBM Integration Bus High Availability OverviewIBM Integration Bus High Availability Overview
IBM Integration Bus High Availability OverviewPeter Broadhurst
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 

Viewers also liked (20)

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big Data
 
IBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: Using Publish/Subscribe in an MQ NetworkIBM MQ: Using Publish/Subscribe in an MQ Network
IBM MQ: Using Publish/Subscribe in an MQ Network
 
Effective Application Development with WebSphere Message Broker
Effective Application Development with WebSphere Message BrokerEffective Application Development with WebSphere Message Broker
Effective Application Development with WebSphere Message Broker
 
Advanced Pattern Authoring with WebSphere Message Broker
Advanced Pattern Authoring with WebSphere Message BrokerAdvanced Pattern Authoring with WebSphere Message Broker
Advanced Pattern Authoring with WebSphere Message Broker
 
Introduction to Patterns in WebSphere Message Broker
Introduction to Patterns in WebSphere Message BrokerIntroduction to Patterns in WebSphere Message Broker
Introduction to Patterns in WebSphere Message Broker
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
 
InterConnect 2016: What's new in IBM MQ
InterConnect 2016: What's new in IBM MQInterConnect 2016: What's new in IBM MQ
InterConnect 2016: What's new in IBM MQ
 
InterConnect 2016: IBM MQ self-service and as-a-service
InterConnect 2016: IBM MQ self-service and as-a-serviceInterConnect 2016: IBM MQ self-service and as-a-service
InterConnect 2016: IBM MQ self-service and as-a-service
 
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ ClustersIBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...
 
Docker Overview - Rise of the Containers
Docker Overview - Rise of the ContainersDocker Overview - Rise of the Containers
Docker Overview - Rise of the Containers
 
Data Power Architectural Patterns - Jagadish Vemugunta
Data Power Architectural Patterns - Jagadish VemuguntaData Power Architectural Patterns - Jagadish Vemugunta
Data Power Architectural Patterns - Jagadish Vemugunta
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
Netflix security monkey overview
Netflix security monkey overviewNetflix security monkey overview
Netflix security monkey overview
 
IBM Integration Bus High Availability Overview
IBM Integration Bus High Availability OverviewIBM Integration Bus High Availability Overview
IBM Integration Bus High Availability Overview
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 

Similar to Kafka Reliability Guarantees Explained

Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Data Con LA
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafkaconfluent
 
3 cucm database
3 cucm database3 cucm database
3 cucm databasepasabakac
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceAnil Nair
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMAvailability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMHostedbyConfluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionCloudera, Inc.
 
Discovering exoplanets with Deep Leaning
Discovering exoplanets with Deep LeaningDiscovering exoplanets with Deep Leaning
Discovering exoplanets with Deep LeaningRafael Arana
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experienceRico Chen
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionCloudera, Inc.
 

Similar to Kafka Reliability Guarantees Explained (20)

Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
3 cucm database
3 cucm database3 cucm database
3 cucm database
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMAvailability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
 
Kafka talk
Kafka talkKafka talk
Kafka talk
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Discovering exoplanets with Deep Leaning
Discovering exoplanets with Deep LeaningDiscovering exoplanets with Deep Leaning
Discovering exoplanets with Deep Leaning
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experience
 
Chicago spark meetup-april2017-public
Chicago spark meetup-april2017-publicChicago spark meetup-april2017-public
Chicago spark meetup-april2017-public
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in ProductionUpgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Kafka Reliability Guarantees Explained

  • 1. 1© Cloudera, Inc. All rights reserved. Kafka Reliability Guarantees
  • 2. 2© Cloudera, Inc. All rights reserved. But First…What’s NEW??? • Released 0.9.0 in late November • 87 Contributors, 523 JIRAs, Bunch o’ new Features. • Security! • Kerberos/SASL Authentication • Authorization Plugin • SSL • Kafka Connect • Quotas • New Consumer****
  • 3. 3© Cloudera, Inc. All rights reserved. Kafka • High Throughput • Low Latency • Scalable • Centralized • Real-time
  • 4. 4© Cloudera, Inc. All rights reserved. “If data is the lifeblood of high technology, Apache Kafka is the circulatory system” --Todd Palino Kafka SRE @ LinkedIn
  • 5. 5© Cloudera, Inc. All rights reserved. If Kafka is a critical piece of our pipeline  Can we be 100% sure that our data will get there?  Can we lose messages?  How do we verify?  Who’s fault is it?
  • 6. 6© Cloudera, Inc. All rights reserved. Distributed Systems  Things Fail  Systems are designed to tolerate failure  We must expect failures and design our code and configure our systems to handle them
  • 7. 7© Cloudera, Inc. All rights reserved. Network Broker MachineClient Machine Data Flow Kafka Client Broker O/S Socket Buffer NIC NIC Page Cache Disk Application Thread O/S Socket Buffer async callback ✗ ✗ ✗ ✗ ✗ ✗ ✗✗ data ack / exception
  • 8. 8© Cloudera, Inc. All rights reserved. Client Machine Kafka Client O/S Socket Buffer NIC Application Thread ✗ ✗ ✗Broker Machine Broker NIC Page Cache Disk O/S Socket Buffer miss ✗ ✗ ✗ ✗ Network Data Flow ✗ data offsets ZK Kafka ✗
  • 9. 9© Cloudera, Inc. All rights reserved. Replication is your friend  Kafka protects against failures by replicating data  The unit of replication is the partition  One replica is designated as the Leader  Follower replicas fetch data from the leader  The leader holds the list of “in-sync” replicas
  • 10. 10© Cloudera, Inc. All rights reserved. Replication and ISRs 0 1 2 0 1 2 0 1 2 Producer Broker 100 Broker 101 Broker 102 Topic: Partitions : Replicas: my_topic 3 3 Partition : Leader: ISR: 1 101 100,102 Partition : Leader: ISR: 2 102 101,100 Partition : Leader: ISR: 0 100 101,102
  • 11. 11© Cloudera, Inc. All rights reserved. ISR • 2 things make a replica in-sync • Lag behind leader • replica.lag.time.max.ms – replica that didn’t fetch or is behind • replica.lag.max.messages – will go away in 0.9 • Connection to Zookeeper
  • 12. 12© Cloudera, Inc. All rights reserved. Terminology • Acked • Producers will not retry sending. • Depends on producer setting • Committed • Consumers can read. • Only when message got to all ISR. • replica.lag.time.max.ms • how long can a dead replica prevent consumers from reading?
  • 13. 13© Cloudera, Inc. All rights reserved. Replication • Acks = all • only waits for in-sync replicas to reply. Replica 3 100 Replica 2 100 Replica 1 100 Time
  • 14. 14© Cloudera, Inc. All rights reserved. Replication Replica 3 100 Replica 2 100 101 Replica 1 100 101 Time • Replica 3 stopped replicating for some reason Acked in acks = all “committed” Acked in acks = 1 but not “committed”
  • 15. 15© Cloudera, Inc. All rights reserved. Replication Replica 3 100 Replica 2 100 101 Replica 1 100 101 Time • One replica drops out of ISR, or goes offline • All messages are now acked and committed
  • 16. 16© Cloudera, Inc. All rights reserved. Replication Replica 3 100 Replica 2 100 101 Replica 1 100 101 102 103 104Time • 2nd Replica drops out, or is offline
  • 17. 17© Cloudera, Inc. All rights reserved. Replication Replica 3 100 Replica 2 100 101 Replica 1 100 101 102 103 104Time • Now we’re in trouble ✗
  • 18. 18© Cloudera, Inc. All rights reserved. Replication • If Replica 2 or 3 come back online before the leader, you can will lose data. Replica 3 100 Replica 2 100 101 Replica 1 100 101 102 103 104Time All those are “acked” and “committed”
  • 19. 19© Cloudera, Inc. All rights reserved. So what to do • Disable Unclean Leader Election • unclean.leader.election.enable = false • Set replication factor • default.replication.factor = 3 • Set minimum ISRs • min.insync.replicas = 2
  • 20. 20© Cloudera, Inc. All rights reserved. Warning • min.insync.replicas is applied at the topic-level. • Must alter the topic configuration manually if created before the server level change • Must manually alter the topic < 0.9.0 (KAFKA-2114)
  • 21. 21© Cloudera, Inc. All rights reserved. Replication • Replication = 3 • Min ISR = 2 Replica 3 100 Replica 2 100 Replica 1 100 Time
  • 22. 22© Cloudera, Inc. All rights reserved. Replication Replica 3 100 Replica 2 100 101 Replica 1 100 101 Time • One replica drops out of ISR, or goes offline
  • 23. 23© Cloudera, Inc. All rights reserved. Replication Replica 3 100 Replica 2 100 101 Replica 1 100 101102 103 104 Time • 2nd Replica fails out, or is out of sync Buffers in Produce r
  • 24. 24© Cloudera, Inc. All rights reserved.
  • 25. 25© Cloudera, Inc. All rights reserved. Producer Internals • Producer sends batches of messages to a buffer M3 Application Thread Application Thread Application Thread send() M2 M1 M0 Batch 3 Batch 2 Batch 1 Fail ? response retry Update Future callback drain Metadata or Exception
  • 26. 26© Cloudera, Inc. All rights reserved. Basics • Durability can be configured with the producer configuration request.required.acks • 0 The message is written to the network (buffer) • 1 The message is written to the leader • all The producer gets an ack after all ISRs receive the data; the message is committed • Make sure producer doesn’t just throws messages away! • block.on.buffer.full = true
  • 27. 27© Cloudera, Inc. All rights reserved. “New” Producer • All calls are non-blocking async • 2 Options for checking for failures: • Immediately block for response: send().get() • Do followup work in Callback, close producer after error threshold • Be careful about buffering these failures. Future work? KAFKA-1955 • Don’t forget to close the producer! producer.close() will block until in- flight txns complete • retries (producer config) defaults to 0 • message.send.max.retries (server config) defaults to 3 • In flight requests could lead to message re-ordering
  • 28. 28© Cloudera, Inc. All rights reserved.
  • 29. 29© Cloudera, Inc. All rights reserved. Consumer • Three choices for Consumer API • Simple Consumer • High Level Consumer • “New Consumer”
  • 30. 30© Cloudera, Inc. All rights reserved. New Consumer • Available in Kafka 0.9.0 • Provides better control over offset management • Enhanced server-side group management
  • 31. 31© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Group Consumer1 Consumer2 Consumer 3 Consumer 4
  • 32. 32© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Group Consumer 1 Consumer 2 Consumer 3 Consumer 4 Commit?
  • 33. 33© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Group Consumer 1 Consumer 2 Consumer 3 Consumer 4 Commit?
  • 34. 34© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Consumer 1 Consumer 2 Consumer 3 Consumer 4 ✗Commit
  • 35. 35© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Thread 1 Thread 2 Thread 3 Thread 4 ✗
  • 36. 36© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Thread 1 Thread 2 Thread 3 Thread 4 Consumer Picks up here
  • 37. 37© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Thread 1 Thread 2 Thread 3 Thread 4 Commit
  • 38. 38© Cloudera, Inc. All rights reserved. Consumer Offsets P0 P2 P3 P4 P5 P6 Consumer Thread 1 Thread 2 Thread 3 Thread 4 Commit Offset commits for all threads
  • 39. 39© Cloudera, Inc. All rights reserved. P0 P2 P3 P4 P5 P6 Consumer 1 Consumer 2 Consumer 3 Consumer 4 Consumer Offsets Commit
  • 40. 40© Cloudera, Inc. All rights reserved. Consumer Recommendations • Set autocommit.enable = false • Manually commit offsets after the message data is processed / persisted consumer.commitOffsets(); • Run each consumer in it’s own thread
  • 41. 41© Cloudera, Inc. All rights reserved. New Consumer! • No Zookeeper! At all! • Rebalance listener • Commit: • Commit • Commit async • Commit( offset) • Seek(offset)
  • 42. 42© Cloudera, Inc. All rights reserved. Exactly Once Semantics • At most once is easy • At least once is not bad either – commit after 100% sure data is safe • Exactly once is tricky • Commit data and offsets in one transaction • Idempotent producer
  • 43. 43© Cloudera, Inc. All rights reserved. Monitoring for Data Loss • Monitor for producer errors – watch the retry numbers • Monitor consumer lag – MaxLag or via offsets • Standard schema: • Each message should contain timestamp and originating service and host • Each producer can report message counts and offsets to a special topic • “Monitoring consumer” reports message counts to another special topic • “Important consumers” also report message counts • Reconcile the results
  • 44. 44© Cloudera, Inc. All rights reserved. Be Safe, Not Sorry • Acks = all • Block.on.buffer.full = true • Retries = MAX_INT • ( Max.inflight.requests.per.connect = 1 ) • Producer.close() • Replication-factor >= 3 • Min.insync.replicas = 2 • Unclean.leader.election = false • Auto.offset.commit = false • Commit after processing • Monitor!
  • 45. 45© Cloudera, Inc. All rights reserved. Thank you

Editor's Notes

  1. This conceptually is our high-level consumer. In this diagram we have a topic with 6 partitions, and an application running 4 threads.
  2. Kafka provides two different paradigms for commiting offsets. The first is “auto-committing”, more on this later. The second is to manually commit offsets in your application. But what’s the right time? If we commit offsets as soon as we actually receive a message, we expose our selves to data loss as we could have process, machine or thread failure before we persist or otherwise process our data.
  3. So what we’d really like to do is only commit offsets after we’ve done some amount of processing and / or persistence on the data. Typical situations would be, after producing a new message to kafka, or after writing a record to HDFS.
  4. So lets so we have auto-commit enabled, and we are chugging along, and counting on the consumer to commit our offsets for us. This is great because we don’t have to code anything, and don’t have think about the frequency of commits and the impact that might have on our throughput. Life is good. But now we’ve lost a thread or a process. And we don’t really know where we are in the processing, Because the last auto-commit committed stuff that we hadn’t actually written to disk.
  5. So now we’re in a situation where we think we’ve read all of our data but we will have gaps in data. Note the same risk applies if we lose a partition or broker and get a new leader. OR
  6. If we add more consumers in the same group and we rebalance the partition assignment. Imagine a scenario where you are hanging in your processing, or there’s some other reason that you have to exit before persisting to disk, the new consumer added will just pick up from the last committed offset.
  7. Ok so don’t use autocommit if you care about this sort of thing.
  8. One other thing to note, is that if you are running some code akin to the ConsumerGroup Example that’s on the wiki, and you are running one consumer with multiple threads, when you issue a commit from one thread, it will commit across all threads. So this isn’t great for all of the reasons that we mentioned a few moments ago.
  9. So disable auto commit. Commit after your processing, and run the high level consumer in it’s own thread.
  10. To cement this: Note a lot this changes in the next release with the new Consumer, but maybe we will revisit that once that is released!