SlideShare a Scribd company logo
Guozhang Wang
Kafka Meetup Shanghai, Oct. 21, 2018
Apache Kafka from 0.7 to 1.0
History and Lesson Learned
A Short History of Kafka
2
LI @ 2010: Point-to-Point Data Pipeline
KV-Store Doc-Store RDBMS
Tracking Logs / Metrics
Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity …
3
KV-Store Doc-Store RDBMS
Tracking Logs / Metrics
Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity …
LI @ 2010: Point-to-Point Data Pipeline
What we want: a centralized data pipeline
4
A Short History
• 2010.10: First commit of Kafka
5
Kafka Concepts: the Log
4 5 5 7 8 9 10 11 12...
Producer Write
Consumer1 Reads
(offset 7)
Consumer2 Reads
(offset 10)
Messages
3
6
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
Kafka Concepts: the Log
7
A Short History
• 2010.10: First commit of Kafka
• 2011.07: Enters Apache Incubator
• Release 0.7.0: compression, mirror-maker
8
Kafka 0.7 Message Format
…
Offset = 0 Offset = M Offset = M+N
M Bytes N Bytes
Message offsets are Physical
9
Kafka 0.7 Message Format
…
Offset = 0 Offset = M Offset = M+N
M Bytes N Bytes
Internal messages of a compressed message are not offset-addressable
10
Kafka 0.7 Message Format
…
Offset = 0 Offset = M Offset = M+N
M Bytes N Bytes
Internal messages of a compressed message are not offset-addressable
Consumer can only checkpoint
offset M for this message
11
Drawbacks of Kafka 0.7
• Hard to checkpoint within compressed message set
• At-least-once: could consume twice
• Hard to rewind consumption by #.messages
• Similarly, tricky to monitor consumption lag
• Unsuitable for features like log compaction (will talk later)
12
BUT!
• Very dumb efficient (high-throughput)
• Just bytes-in-bytes-out for brokers
• Hence the bottleneck was predominantly network
• 1Gbps NICs are saturated most of the time
• CPU usages usually < 10%
• IO Flops are low thanks to “zero-copy”
13
Example: Pub-Sub Messaging
Tracking Logs / Metrics
Hadoop / DW
Apache Kafka
…
14
A Short History
• 2010.10: First commit of Kafka
• 2011.07: Enters Apache Incubator
• Release 0.7.0: compression, mirror-maker
• 2012.10: Graduated to top-level project
• Release 0.8.0: intra-cluster replication
15
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
High-Availability: Must-have
16
Kafka 0.8: Replicas and Layout
Logs
Broker-1
topic1-part1
topic1-part3
topic1-part2
Logs
topic1-part2
topic1-part1
topic1-part3
Logs
topic1-part3
topic1-part2
topic1-part1
Broker-2 Broker-3
17
Consensus for Log Replication
Logs
Broker-1
Logs Logs
Broker-2 Broker-3
Write
Consensus
Protocol
Consensus
Protocol
18
Kafka 0.8 Message Format
…
Offset = 0 Offset = 1 Offset = 2
M Bytes N Bytes
Message offsets are Logical and Continuous
19
Kafka 0.8 Message Format
…
Offset = 2 Offset = 4 Offset = 7
3 Messages 2 Messages
Brokers need to assign internal
message offsets on receiving
Compressed message offset is the largest offset among internal messages
0 1 2 3 4 5 6 7
20
Kafka 0.8 Message Format
Offset = 2 Offset = 4
3 Messages 2 Messages
Next offset is 5
21
Kafka 0.8 Message Format
Offset = 2 Offset = 4
3 Messages 2 Messages
Decompress
Next offset is 5
22
Kafka 0.8 Message Format
Offset = 2 Offset = 4
3 Messages 2 Messages
Assign offsets
5 6 7Next offset is 5
23
Kafka 0.8 Message Format
Offset = 2 Offset = 4
3 Messages 2 Messages
Offset = 7
Re-compress
Next offset is 5
24
Kafka 0.8 Message Format
Offset = 2 Offset = 4
3 Messages 2 Messages
Offset = 7
Append
25
Example: Centralized Data Pipeline
KV-Store Doc-Store RDBMS
Tracking Logs / Metrics
Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity
Apache Kafka
…
26
Shifting Bottleneck
• Predominantly network in 0.7
• Just bytes-in-bytes-out for brokers
• Still network in 0.8, but tilting towards CPU / Storage
• More CPU cost due to decompress / re-compress
• Data replication, consensus protocol: 

• One message now copied X times for replication
27
A Short History
• 2010.10: First commit of Kafka
• 2011.07: Enters Apache Incubator
• Release 0.7.0: compression, mirror-maker
• 2012.10: Graduated to top-level project
• Release 0.8.0: intra-cluster replication
• 2014.11: Confluent founded
• Release 0.8.2: new producer
• Release 0.9.0: new consumer, quota, security
28
One naughty client can bother everyone ..
29
One naughty client can bother everyone ..
30
One naughty client can bother everyone ..
31
One naughty client can bother everyone ..
32
Quota
• Who to limit: client-id
• What to limit: Mbps
• Defined on per-broker basis (bytes-in for producer, bytes-out for consumer)
• Throttle on violation
// Default bytes-out per client.
quota.consumer.default=2M
quota.producer.default=2M
// Overrides
quota.producer.override="clientA:4M,clientB:10M” 33
Security
• Authentication (SSL) and authorization
• Who (client-id) can do what (create, read, write, etc)
• Minor impact on throughput
• Forego “zero-copy” optimization
• CPU overhead to decrypt / encrypt
34
Key-based Log Compaction
...
Partition Messages
Segment-3 Segment-4 Segment-5 *
35
Key-based Log Compaction
d: 3 f: 8 b: 0 c: null...
Partition Messages
c: 3 a: 5 a: 6 a: 5 f: 9 ...
Segment-3 Segment-4
b: 2 d: 4a: 1
36
Key-based Log Compaction
New Segment
Partition Messages
d: 3 f: 8 b: 0 c: null... c: 3 a: 5 a: 6 a: 5 f: 9 ...
Segment-3 Segment-4
b: 2 d: 4a: 1
37
Key-based Log Compaction
New Segment
Partition Messages
d: 3 f: 8 b: 0 c: null... c: 3 a: 5 a: 6 a: 5 f: 9 ...
Segment-3 Segment-4
b: 2 d: 4a: 1 c: 3 a: 5 a: 6b: 2 d: 4a: 1
38
Key-based Log Compaction
... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ...
Segment-3 Segment-4
c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: null a: 5 f: 9
New Segment
Partition Messages
c: 3b: 2 d: 4a: 1
a: 5
39
Key-based Log Compaction
... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ...
Segment-3 Segment-4
c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: null a: 5 f: 9
New Segment
Partition Messages
c: 3b: 2 d: 4
a: 5
a: 5
40
Key-based Log Compaction
... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ...
Segment-3 Segment-4
c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: null a: 5 f: 9
New Segment
Partition Messages
a: 6 d: 3 f: 8 b: 0c: 3b: 2 d: 4
41
Key-based Log Compaction
... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ...
Segment-3 Segment-4
c: 3 a: 5 a: 6b: 2 d: 4a: 1
New Segment
Partition Messages
d: 3 b: 0 a: 5 f: 9
42
Key-based Log Compaction
... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ...
Segment-3 Segment-4
c: 3 a: 5 a: 6b: 2 d: 4a: 1
New Segment
Partition Messages
d: 3 b: 0 a: 5 f: 9
43
Example: Data Store Geo-Replication
Apache
Local Stores
User Apps User Apps
Local Stores
Apache
Region 2Region 1
write read
append log
mirroring
apply log
44
Shifting Bottleneck
• Predominantly network in 0.7
• Just bytes-in-bytes-out for brokers
• Storage and network in 0.8
• 1Gbps NICs
• Increasing CPU in 0.9
• New hardware since 2015: bigger disks, XFS, 10Gbps NICs..
• De(re)compress, de(en)crypt, compaction, coordination, etc..
45
A Short History
• 2010.10: First commit of Kafka
• 2011.07: Enters Apache Incubator
• Release 0.7.0: compression, mirror-maker
• 2012.10: Graduated to top-level project
• Release 0.8.0: intra-cluster replication
• 2014.11: Confluent founded
• Release 0.8.2: new producer
• Release 0.9.0: new consumer, quota, security
• Release 0.10.0: timestamps, rack awareness 46
Coarsened “Time” under Kafka 0.9
...
Partition Messages
Segment-3 Segment-4 Segment-5
Message time is the mtime of the
segment file, so one stamp per-segment
47
Kafka 0.10 Message Format
…
Offset = 0 Offset = 1 Offset = 2
M Bytes N Bytes
Timestamp = 25 Timestamp = 55 Timestamp = 60
48
Finer “Time” in Kafka 0.10
...
Partition Messages
Segment-3 Segment-4 Segment-5
Timestamp per message (create time)
or per-message-set (append time)
time offset time offset time offset
49
Finer “Time” in Kafka 0.10
...
Partition Messages
Segment-3 Segment-4 Segment-5
time offset time offset time offset
Timestamp per message (create time)
or per-message-set (append time)
Finer-grained time-based lookup (fetch offset request)
50
Finer “Time” in Kafka 0.10
...
Partition Messages
Segment-3 Segment-4 Segment-5
time offset time offset time offset
Timestamp per message (create time)
or per-message-set (append time)
More accurate time-based log rolling / log retention
51
A Short History
• 2016.04: First Kafka Summit @San Francisco
• Release 0.9.0: Kafka Connect
• Release 0.10.0: Kafka Streams
52
Kafka Streams (0.10+)
• New client library besides producer and consumer
• Powerful yet easy-to-use
• Event-at-a-time, Stateful
• Windowing with out-of-order handling
• Highly scalable, distributed, fault tolerant
• and more..
[BIRTE 2015]
53
Anywhere, anytime
Ok. Ok. Ok. Ok.
54
Anywhere, anytime
War File
Rsync
Puppet/Chef
YARN
M
esos
Docker
Kubernetes
55
Processor Topology
56Kafka Streams Kafka
Stream Partitions and Tasks
57
Kafka Topic B Kafka Topic A
P1
P2
P1
P2
Stream Partitions and Tasks
58
Kafka Topic B Kafka Topic A
Processor Topology
P1
P2
P1
P2
Stream Partitions and Tasks
59
Kafka Topic AKafka Topic B
Kafka Topic B
Stream Threads
60
Kafka Topic A
MyApp.1 MyApp.2
Task2Task1
But how to get data in / out Kafka?
61
62
Connectors (0.9+)
• 45+ since first
release
• 30 from
& partners
63
A Short History
• 2016.04: First Kafka Summit @San Francisco
• Release 0.9.0: Kafka Connect
• Release 0.10.0: Kafka Streams
• 2017.08: Apache Kafka Goes 1.0
• Release 0.11.0: Exactly-Once
64
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
Your App
65
Stream Processing with Kafka
Process
State
Ads Clicks
Ads Displays
Billing Updates
Fraud Suspects
ack
ack
commit
Your App
66
Error Scenario #1: Duplicate Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
Streams App
67
Error Scenario #1: Duplicate Writes
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
producer config: retries = N
Streams App
68
Error Scenario #2: Re-process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
commit
ack
ack
State
Process
Streams App
69
Error Scenario #2: Re-process
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
70
Exactly-Once
An application property for stream processing,
.. that for each received record,
.. its process results will be reflected exactly once,
.. even under failures
71
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Life before 0.11: At-least-once + Dedup
72
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
Life before 0.11: At-least-once + Dedup
73
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
Life before 0.11: At-least-once + Dedup
74
2
2
3
3
4
4
Life before 0.11: At-least-once + Dedup
75
Exactly-once, the Kafka Way!(0.11+)
76
• Building blocks to achieve exactly-once

• Idempotence: de-duped sends in order per partition

• Transactions: atomic multiple-sends across topic partitions

• Kafka Streams: enable exactly-once in a single knob
Exactly-once, the Kafka Way!(0.11+)
77
78
Connect
End-to-End Exactly-Once
Connect
Connect
Connect
Connect
Connect
Connect
Streams
Streams
Streams
Streams
Streams
Shifting Bottleneck, Continued..
• Storage and network in 0.7 / 0.8
• Increasing CPU in 0.9 - 0.11
• TLS, CRC, Quota, protocol down-conversion
• Txn coordination, etc..
• Operation efficiency on 0.11+
• Leader balancing, partition migration, cluster expansion..
• ZK-dependency, multi-DC support ..
79
A Short History
• 2016.04: First Kafka Summit @San Francisco
• Release 0.9.0: Kafka Connect
• Release 0.10.0: Kafka Streams
• 2017.08: Apache Kafka Goes 1.0
• Release 0.11.0: Exactly-Once


• Release 1.0+: More security and operability (controller re-design, Java9 withTLS / CRC, etc)
• Release 1.0+: Better scalability (JBOD, etc)
• Release 1.0+: Online-evolvability (down-conversion optimization, etc)
80
• One broker in a cluster acts as controller
• Monitor the liveness of brokers (via ZK)
• Select new leaders on broker failures
• Communicate new leaders to brokers
• Controller re-elected on failures (via ZK)
Example: Controller Re-design
81
Logs
Broker-1 *
Logs Logs
Broker-2 Broker-3
Example: Controlled Shutdown
SIG_TERM Zookeeper
ISR {1, 2, 3}
82
Logs
Broker-1 *
Logs Logs
Broker-2 Broker-3
Example: Controlled Shutdown
Zookeeper
ISR {1, 2, 3}
83
Logs
Broker-1 *
Logs Logs
Broker-2 Broker-3
Example: Controlled Shutdown
Zookeeper
ISR {1, 2, 3}
Zookeeper
ISR {2, 3}
84
Logs
Broker-1 *
Logs Logs
Broker-2 Broker-3
Example: Controlled Shutdown
Zookeeper
ISR {2, 3}ISR {2, 3}ISR {2, 3}
85
Logs
Broker-1
Logs Logs
Broker-2 * Broker-3
Example: Controlled Shutdown
Zookeeper
ISR {2, 3}
86
Logs
Broker-1
Logs Logs
Broker-2 * Broker-3
Example: Controlled Shutdown
Zookeeper
ISR {2, 3}
87
Logs
Broker-1
Logs Logs
Broker-2 * Broker-3
Issues with Controlled Shutdown (pre 1.1)
Zookeeper
ISR {2, 3}
ISR {2, 3} ISR {2, 3}
Writes to ZK
are serial



Impact: longer
shutdown time
Comm. of
new leaders
not batched



Impact: client
timeout
88
Results for Controlled Shutdown (post 1.1)
• 5 Zookeeper nodes, 5 brokers on different racks
• 25k topics, 1 partition per topic, 2 replicas
• 10k partitions per brokers
Kafka 1.0.0 Kafka 1.1.0
Controlled shutdown time 6.5 minutes 3 seconds
89
Results for Controller failover (post 1.1)
• 5 Zookeeper nodes, 5 brokers on different racks
• 2k topics, 50 partitions per topic, 1 replicas
• Controller failover: reload 100k partitions from ZK
Kafka 1.0.0 Kafka 1.1.0
Controller state reload time 28 seconds 14 seconds
90
A Short History
• 2016.04: First Kafka Summit @San Francisco
• Release 0.9.0: Kafka Connect
• Release 0.10.0: Kafka Streams
• 2017.08: Apache Kafka Goes 1.0
• Release 0.11.0: Exactly-Once

• Release 1.0+: More security and operability (controller re-design, Java9 withTLS / CRC, etc)
• Release 1.0+: Better scalability (JBOD, etc)
• Release 1.0+: Online-evolvability (down-conversion optimization, etc)
• Future:
• Global, Infinite and Cloud-Native Kafka 91
What we have learned?
92
Lesson 1: Build evolvable systems
93
Upgrade your Kafka cluster is like ..
94
Kafka: Evolvable System
• Zero down-time
• Maintenance outage? No such thing.
• All protocols versioned
• Brokers can talk to older versioned clients
• And vice versa since 0.10.2!
• One should do no more than rolling bounces
• Staging before production
95
Lesson 2: What gets measured gets fixed
96
97
Lesson 3: APIs stay forever
98
The Story of KAFKA-1481
99
The Story of KAFKA-1481
100
101
102
103
Lesson 4: Service needs gatekeepers
104
Remember This?
105
Multi-tenancy Services (in the Cloud)
• Security from ground up
• End-to-end encryption

• ACL / RBAC definitions
• Resources under control
• Quota on bytes rate / request rate, CPU resources

• Capacity preservation / allocation to quotas

• Limit on num.connections, etc
106
So What is Kafka, Really?
107
What is Kafka, Really?
[NetDB 2011]a scalable pub-sub messaging system..
108
Example: Pub-Sub Messaging
Tracking Logs / Metrics
Hadoop / DW
Apache Kafka
…
109
What is Kafka, Really?
[NetDB 2011]
[Hadoop Summit 2013]
a scalable pub-sub messaging system..
a real-time data pipeline..
110
Example: Centralized Data Pipeline
KV-Store Doc-Store RDBMS
Tracking Logs / Metrics
Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity
Apache Kafka
…
111
What is Kafka, Really?
[NetDB 2011]
[Hadoop Summit 2013]
[VLDB 2015]
a scalable pub-sub messaging system..
a real-time data pipeline..
a distributed and replicated log..
112
Example: Data Store Geo-Replication
Apache
Local Stores
User Apps User Apps
Local Stores
Apache
Region 2Region 1
write read
append log
mirroring
apply log
113
What is Kafka, Really?
a scalable pub-sub messaging system.. [NetDB 2011]
a real-time data pipeline.. [Hadoop Summit 2013]
a distributed and replicated log.. [VLDB 2015]
a unified data integration stack.. [CIDR 2015]
114
Example: Async. Micro-Services
115
What is Kafka, Really?
a scalable pub-sub messaging system.. [NetDB 2011]
a real-time data pipeline.. [Hadoop Summit 2013]
a distributed and replicated log.. [VLDB 2015]
a unified data integration stack.. [CIDR 2015]
All of them!
116
Kafka: Streaming Platform
• Publish / Subscribe
• Move data around as online streams
• Store
• “Source-of-truth” continuous data
• Process
• React / process data in real-time
117
Kafka adoption World-Wide
6 of the top 10 travel companies
8 of the top 10 insurance companies
7 of the top 10 global banks
9 of the top 10 telecom companies
118
Welcome Contributors!
119
THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
120

More Related Content

What's hot

How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
kafka
kafkakafka
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
confluent
 
Kafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInKafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedIn
Aditya Auradkar
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
confluent
 

What's hot (20)

How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
kafka
kafkakafka
kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Kafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInKafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedIn
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 

Similar to Apache Kafka from 0.7 to 1.0, History and Lesson Learned

Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
VMware Tanzu
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
NguyenChiHoangMinh
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
NIMITJAIN71
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Ricardo Bravo
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer Optimization
SignalFx
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Scaling big with Apache Kafka
Scaling big with Apache KafkaScaling big with Apache Kafka
Scaling big with Apache Kafka
Nikolay Stoitsev
 
Cloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewCloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical Overview
Messaging Meetup
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sKaushikNarayanan
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
MvkZ
 

Similar to Apache Kafka from 0.7 to 1.0, History and Lesson Learned (20)

Getting Started with Kafka on k8s
Getting Started with Kafka on k8sGetting Started with Kafka on k8s
Getting Started with Kafka on k8s
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Apache Kafka Women Who Code Meetup
Apache Kafka Women Who Code MeetupApache Kafka Women Who Code Meetup
Apache Kafka Women Who Code Meetup
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
SignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer OptimizationSignalFx Kafka Consumer Optimization
SignalFx Kafka Consumer Optimization
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Scaling big with Apache Kafka
Scaling big with Apache KafkaScaling big with Apache Kafka
Scaling big with Apache Kafka
 
Cloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewCloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical Overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

More from Guozhang Wang

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
Guozhang Wang
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
Guozhang Wang
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
Guozhang Wang
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Guozhang Wang
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
Guozhang Wang
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Guozhang Wang
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
Guozhang Wang
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
Guozhang Wang
 

More from Guozhang Wang (14)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 

Recently uploaded

block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 

Recently uploaded (20)

block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 

Apache Kafka from 0.7 to 1.0, History and Lesson Learned

  • 1. Guozhang Wang Kafka Meetup Shanghai, Oct. 21, 2018 Apache Kafka from 0.7 to 1.0 History and Lesson Learned
  • 2. A Short History of Kafka 2
  • 3. LI @ 2010: Point-to-Point Data Pipeline KV-Store Doc-Store RDBMS Tracking Logs / Metrics Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity … 3
  • 4. KV-Store Doc-Store RDBMS Tracking Logs / Metrics Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity … LI @ 2010: Point-to-Point Data Pipeline What we want: a centralized data pipeline 4
  • 5. A Short History • 2010.10: First commit of Kafka 5
  • 6. Kafka Concepts: the Log 4 5 5 7 8 9 10 11 12... Producer Write Consumer1 Reads (offset 7) Consumer2 Reads (offset 10) Messages 3 6
  • 8. A Short History • 2010.10: First commit of Kafka • 2011.07: Enters Apache Incubator • Release 0.7.0: compression, mirror-maker 8
  • 9. Kafka 0.7 Message Format … Offset = 0 Offset = M Offset = M+N M Bytes N Bytes Message offsets are Physical 9
  • 10. Kafka 0.7 Message Format … Offset = 0 Offset = M Offset = M+N M Bytes N Bytes Internal messages of a compressed message are not offset-addressable 10
  • 11. Kafka 0.7 Message Format … Offset = 0 Offset = M Offset = M+N M Bytes N Bytes Internal messages of a compressed message are not offset-addressable Consumer can only checkpoint offset M for this message 11
  • 12. Drawbacks of Kafka 0.7 • Hard to checkpoint within compressed message set • At-least-once: could consume twice • Hard to rewind consumption by #.messages • Similarly, tricky to monitor consumption lag • Unsuitable for features like log compaction (will talk later) 12
  • 13. BUT! • Very dumb efficient (high-throughput) • Just bytes-in-bytes-out for brokers • Hence the bottleneck was predominantly network • 1Gbps NICs are saturated most of the time • CPU usages usually < 10% • IO Flops are low thanks to “zero-copy” 13
  • 14. Example: Pub-Sub Messaging Tracking Logs / Metrics Hadoop / DW Apache Kafka … 14
  • 15. A Short History • 2010.10: First commit of Kafka • 2011.07: Enters Apache Incubator • Release 0.7.0: compression, mirror-maker • 2012.10: Graduated to top-level project • Release 0.8.0: intra-cluster replication 15
  • 17. Kafka 0.8: Replicas and Layout Logs Broker-1 topic1-part1 topic1-part3 topic1-part2 Logs topic1-part2 topic1-part1 topic1-part3 Logs topic1-part3 topic1-part2 topic1-part1 Broker-2 Broker-3 17
  • 18. Consensus for Log Replication Logs Broker-1 Logs Logs Broker-2 Broker-3 Write Consensus Protocol Consensus Protocol 18
  • 19. Kafka 0.8 Message Format … Offset = 0 Offset = 1 Offset = 2 M Bytes N Bytes Message offsets are Logical and Continuous 19
  • 20. Kafka 0.8 Message Format … Offset = 2 Offset = 4 Offset = 7 3 Messages 2 Messages Brokers need to assign internal message offsets on receiving Compressed message offset is the largest offset among internal messages 0 1 2 3 4 5 6 7 20
  • 21. Kafka 0.8 Message Format Offset = 2 Offset = 4 3 Messages 2 Messages Next offset is 5 21
  • 22. Kafka 0.8 Message Format Offset = 2 Offset = 4 3 Messages 2 Messages Decompress Next offset is 5 22
  • 23. Kafka 0.8 Message Format Offset = 2 Offset = 4 3 Messages 2 Messages Assign offsets 5 6 7Next offset is 5 23
  • 24. Kafka 0.8 Message Format Offset = 2 Offset = 4 3 Messages 2 Messages Offset = 7 Re-compress Next offset is 5 24
  • 25. Kafka 0.8 Message Format Offset = 2 Offset = 4 3 Messages 2 Messages Offset = 7 Append 25
  • 26. Example: Centralized Data Pipeline KV-Store Doc-Store RDBMS Tracking Logs / Metrics Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity Apache Kafka … 26
  • 27. Shifting Bottleneck • Predominantly network in 0.7 • Just bytes-in-bytes-out for brokers • Still network in 0.8, but tilting towards CPU / Storage • More CPU cost due to decompress / re-compress • Data replication, consensus protocol: 
 • One message now copied X times for replication 27
  • 28. A Short History • 2010.10: First commit of Kafka • 2011.07: Enters Apache Incubator • Release 0.7.0: compression, mirror-maker • 2012.10: Graduated to top-level project • Release 0.8.0: intra-cluster replication • 2014.11: Confluent founded • Release 0.8.2: new producer • Release 0.9.0: new consumer, quota, security 28
  • 29. One naughty client can bother everyone .. 29
  • 30. One naughty client can bother everyone .. 30
  • 31. One naughty client can bother everyone .. 31
  • 32. One naughty client can bother everyone .. 32
  • 33. Quota • Who to limit: client-id • What to limit: Mbps • Defined on per-broker basis (bytes-in for producer, bytes-out for consumer) • Throttle on violation // Default bytes-out per client. quota.consumer.default=2M quota.producer.default=2M // Overrides quota.producer.override="clientA:4M,clientB:10M” 33
  • 34. Security • Authentication (SSL) and authorization • Who (client-id) can do what (create, read, write, etc) • Minor impact on throughput • Forego “zero-copy” optimization • CPU overhead to decrypt / encrypt 34
  • 35. Key-based Log Compaction ... Partition Messages Segment-3 Segment-4 Segment-5 * 35
  • 36. Key-based Log Compaction d: 3 f: 8 b: 0 c: null... Partition Messages c: 3 a: 5 a: 6 a: 5 f: 9 ... Segment-3 Segment-4 b: 2 d: 4a: 1 36
  • 37. Key-based Log Compaction New Segment Partition Messages d: 3 f: 8 b: 0 c: null... c: 3 a: 5 a: 6 a: 5 f: 9 ... Segment-3 Segment-4 b: 2 d: 4a: 1 37
  • 38. Key-based Log Compaction New Segment Partition Messages d: 3 f: 8 b: 0 c: null... c: 3 a: 5 a: 6 a: 5 f: 9 ... Segment-3 Segment-4 b: 2 d: 4a: 1 c: 3 a: 5 a: 6b: 2 d: 4a: 1 38
  • 39. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: null a: 5 f: 9 New Segment Partition Messages c: 3b: 2 d: 4a: 1 a: 5 39
  • 40. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: null a: 5 f: 9 New Segment Partition Messages c: 3b: 2 d: 4 a: 5 a: 5 40
  • 41. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: null a: 5 f: 9 New Segment Partition Messages a: 6 d: 3 f: 8 b: 0c: 3b: 2 d: 4 41
  • 42. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 New Segment Partition Messages d: 3 b: 0 a: 5 f: 9 42
  • 43. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 New Segment Partition Messages d: 3 b: 0 a: 5 f: 9 43
  • 44. Example: Data Store Geo-Replication Apache Local Stores User Apps User Apps Local Stores Apache Region 2Region 1 write read append log mirroring apply log 44
  • 45. Shifting Bottleneck • Predominantly network in 0.7 • Just bytes-in-bytes-out for brokers • Storage and network in 0.8 • 1Gbps NICs • Increasing CPU in 0.9 • New hardware since 2015: bigger disks, XFS, 10Gbps NICs.. • De(re)compress, de(en)crypt, compaction, coordination, etc.. 45
  • 46. A Short History • 2010.10: First commit of Kafka • 2011.07: Enters Apache Incubator • Release 0.7.0: compression, mirror-maker • 2012.10: Graduated to top-level project • Release 0.8.0: intra-cluster replication • 2014.11: Confluent founded • Release 0.8.2: new producer • Release 0.9.0: new consumer, quota, security • Release 0.10.0: timestamps, rack awareness 46
  • 47. Coarsened “Time” under Kafka 0.9 ... Partition Messages Segment-3 Segment-4 Segment-5 Message time is the mtime of the segment file, so one stamp per-segment 47
  • 48. Kafka 0.10 Message Format … Offset = 0 Offset = 1 Offset = 2 M Bytes N Bytes Timestamp = 25 Timestamp = 55 Timestamp = 60 48
  • 49. Finer “Time” in Kafka 0.10 ... Partition Messages Segment-3 Segment-4 Segment-5 Timestamp per message (create time) or per-message-set (append time) time offset time offset time offset 49
  • 50. Finer “Time” in Kafka 0.10 ... Partition Messages Segment-3 Segment-4 Segment-5 time offset time offset time offset Timestamp per message (create time) or per-message-set (append time) Finer-grained time-based lookup (fetch offset request) 50
  • 51. Finer “Time” in Kafka 0.10 ... Partition Messages Segment-3 Segment-4 Segment-5 time offset time offset time offset Timestamp per message (create time) or per-message-set (append time) More accurate time-based log rolling / log retention 51
  • 52. A Short History • 2016.04: First Kafka Summit @San Francisco • Release 0.9.0: Kafka Connect • Release 0.10.0: Kafka Streams 52
  • 53. Kafka Streams (0.10+) • New client library besides producer and consumer • Powerful yet easy-to-use • Event-at-a-time, Stateful • Windowing with out-of-order handling • Highly scalable, distributed, fault tolerant • and more.. [BIRTE 2015] 53
  • 57. Stream Partitions and Tasks 57 Kafka Topic B Kafka Topic A P1 P2 P1 P2
  • 58. Stream Partitions and Tasks 58 Kafka Topic B Kafka Topic A Processor Topology P1 P2 P1 P2
  • 59. Stream Partitions and Tasks 59 Kafka Topic AKafka Topic B
  • 60. Kafka Topic B Stream Threads 60 Kafka Topic A MyApp.1 MyApp.2 Task2Task1
  • 61. But how to get data in / out Kafka? 61
  • 62. 62
  • 63. Connectors (0.9+) • 45+ since first release • 30 from & partners 63
  • 64. A Short History • 2016.04: First Kafka Summit @San Francisco • Release 0.9.0: Kafka Connect • Release 0.10.0: Kafka Streams • 2017.08: Apache Kafka Goes 1.0 • Release 0.11.0: Exactly-Once 64
  • 65. Stream Processing with Kafka Process State Ads Clicks Ads Displays Billing Updates Fraud Suspects Your App 65
  • 66. Stream Processing with Kafka Process State Ads Clicks Ads Displays Billing Updates Fraud Suspects ack ack commit Your App 66
  • 67. Error Scenario #1: Duplicate Writes Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack Streams App 67
  • 68. Error Scenario #1: Duplicate Writes Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack producer config: retries = N Streams App 68
  • 69. Error Scenario #2: Re-process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D commit ack ack State Process Streams App 69
  • 70. Error Scenario #2: Re-process State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Streams App 70
  • 71. Exactly-Once An application property for stream processing, .. that for each received record, .. its process results will be reflected exactly once, .. even under failures 71
  • 72. Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Life before 0.11: At-least-once + Dedup 72
  • 73. Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack Life before 0.11: At-least-once + Dedup 73
  • 74. Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit Life before 0.11: At-least-once + Dedup 74
  • 75. 2 2 3 3 4 4 Life before 0.11: At-least-once + Dedup 75
  • 76. Exactly-once, the Kafka Way!(0.11+) 76
  • 77. • Building blocks to achieve exactly-once
 • Idempotence: de-duped sends in order per partition
 • Transactions: atomic multiple-sends across topic partitions
 • Kafka Streams: enable exactly-once in a single knob Exactly-once, the Kafka Way!(0.11+) 77
  • 79. Shifting Bottleneck, Continued.. • Storage and network in 0.7 / 0.8 • Increasing CPU in 0.9 - 0.11 • TLS, CRC, Quota, protocol down-conversion • Txn coordination, etc.. • Operation efficiency on 0.11+ • Leader balancing, partition migration, cluster expansion.. • ZK-dependency, multi-DC support .. 79
  • 80. A Short History • 2016.04: First Kafka Summit @San Francisco • Release 0.9.0: Kafka Connect • Release 0.10.0: Kafka Streams • 2017.08: Apache Kafka Goes 1.0 • Release 0.11.0: Exactly-Once

 • Release 1.0+: More security and operability (controller re-design, Java9 withTLS / CRC, etc) • Release 1.0+: Better scalability (JBOD, etc) • Release 1.0+: Online-evolvability (down-conversion optimization, etc) 80
  • 81. • One broker in a cluster acts as controller • Monitor the liveness of brokers (via ZK) • Select new leaders on broker failures • Communicate new leaders to brokers • Controller re-elected on failures (via ZK) Example: Controller Re-design 81
  • 82. Logs Broker-1 * Logs Logs Broker-2 Broker-3 Example: Controlled Shutdown SIG_TERM Zookeeper ISR {1, 2, 3} 82
  • 83. Logs Broker-1 * Logs Logs Broker-2 Broker-3 Example: Controlled Shutdown Zookeeper ISR {1, 2, 3} 83
  • 84. Logs Broker-1 * Logs Logs Broker-2 Broker-3 Example: Controlled Shutdown Zookeeper ISR {1, 2, 3} Zookeeper ISR {2, 3} 84
  • 85. Logs Broker-1 * Logs Logs Broker-2 Broker-3 Example: Controlled Shutdown Zookeeper ISR {2, 3}ISR {2, 3}ISR {2, 3} 85
  • 86. Logs Broker-1 Logs Logs Broker-2 * Broker-3 Example: Controlled Shutdown Zookeeper ISR {2, 3} 86
  • 87. Logs Broker-1 Logs Logs Broker-2 * Broker-3 Example: Controlled Shutdown Zookeeper ISR {2, 3} 87
  • 88. Logs Broker-1 Logs Logs Broker-2 * Broker-3 Issues with Controlled Shutdown (pre 1.1) Zookeeper ISR {2, 3} ISR {2, 3} ISR {2, 3} Writes to ZK are serial
 
 Impact: longer shutdown time Comm. of new leaders not batched
 
 Impact: client timeout 88
  • 89. Results for Controlled Shutdown (post 1.1) • 5 Zookeeper nodes, 5 brokers on different racks • 25k topics, 1 partition per topic, 2 replicas • 10k partitions per brokers Kafka 1.0.0 Kafka 1.1.0 Controlled shutdown time 6.5 minutes 3 seconds 89
  • 90. Results for Controller failover (post 1.1) • 5 Zookeeper nodes, 5 brokers on different racks • 2k topics, 50 partitions per topic, 1 replicas • Controller failover: reload 100k partitions from ZK Kafka 1.0.0 Kafka 1.1.0 Controller state reload time 28 seconds 14 seconds 90
  • 91. A Short History • 2016.04: First Kafka Summit @San Francisco • Release 0.9.0: Kafka Connect • Release 0.10.0: Kafka Streams • 2017.08: Apache Kafka Goes 1.0 • Release 0.11.0: Exactly-Once
 • Release 1.0+: More security and operability (controller re-design, Java9 withTLS / CRC, etc) • Release 1.0+: Better scalability (JBOD, etc) • Release 1.0+: Online-evolvability (down-conversion optimization, etc) • Future: • Global, Infinite and Cloud-Native Kafka 91
  • 92. What we have learned? 92
  • 93. Lesson 1: Build evolvable systems 93
  • 94. Upgrade your Kafka cluster is like .. 94
  • 95. Kafka: Evolvable System • Zero down-time • Maintenance outage? No such thing. • All protocols versioned • Brokers can talk to older versioned clients • And vice versa since 0.10.2! • One should do no more than rolling bounces • Staging before production 95
  • 96. Lesson 2: What gets measured gets fixed 96
  • 97. 97
  • 98. Lesson 3: APIs stay forever 98
  • 99. The Story of KAFKA-1481 99
  • 100. The Story of KAFKA-1481 100
  • 101. 101
  • 102. 102
  • 103. 103
  • 104. Lesson 4: Service needs gatekeepers 104
  • 106. Multi-tenancy Services (in the Cloud) • Security from ground up • End-to-end encryption
 • ACL / RBAC definitions • Resources under control • Quota on bytes rate / request rate, CPU resources
 • Capacity preservation / allocation to quotas
 • Limit on num.connections, etc 106
  • 107. So What is Kafka, Really? 107
  • 108. What is Kafka, Really? [NetDB 2011]a scalable pub-sub messaging system.. 108
  • 109. Example: Pub-Sub Messaging Tracking Logs / Metrics Hadoop / DW Apache Kafka … 109
  • 110. What is Kafka, Really? [NetDB 2011] [Hadoop Summit 2013] a scalable pub-sub messaging system.. a real-time data pipeline.. 110
  • 111. Example: Centralized Data Pipeline KV-Store Doc-Store RDBMS Tracking Logs / Metrics Hadoop / DW Monitoring Rec. Engine Social GraphSearchingSecurity Apache Kafka … 111
  • 112. What is Kafka, Really? [NetDB 2011] [Hadoop Summit 2013] [VLDB 2015] a scalable pub-sub messaging system.. a real-time data pipeline.. a distributed and replicated log.. 112
  • 113. Example: Data Store Geo-Replication Apache Local Stores User Apps User Apps Local Stores Apache Region 2Region 1 write read append log mirroring apply log 113
  • 114. What is Kafka, Really? a scalable pub-sub messaging system.. [NetDB 2011] a real-time data pipeline.. [Hadoop Summit 2013] a distributed and replicated log.. [VLDB 2015] a unified data integration stack.. [CIDR 2015] 114
  • 116. What is Kafka, Really? a scalable pub-sub messaging system.. [NetDB 2011] a real-time data pipeline.. [Hadoop Summit 2013] a distributed and replicated log.. [VLDB 2015] a unified data integration stack.. [CIDR 2015] All of them! 116
  • 117. Kafka: Streaming Platform • Publish / Subscribe • Move data around as online streams • Store • “Source-of-truth” continuous data • Process • React / process data in real-time 117
  • 118. Kafka adoption World-Wide 6 of the top 10 travel companies 8 of the top 10 insurance companies 7 of the top 10 global banks 9 of the top 10 telecom companies 118
  • 120. THANKS! Guozhang Wang | guozhang@confluent.io | @guozhangwang 120