Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream Processing
1. Data Models and Consumer
Idioms Using Apache Kafka for
Continuous Data Stream
Processing
Surge’12
September 27, 2012
Erik Onnen
@eonnen
2. About Me
• Director of Architecture and Development at Urban
Airship
• Formerly Jive Software, Liberty Mutual, Opsware,
Progress
• Java, C++, Python
• Background in messaging systems
• Contributor to ActiveMQ
• Global Tibco deployments
• ESB Commercial Products
3. About Urban Airship
• Engagement platform using location and push
notifications
• Analytics for delivery, conversion and influence
• High precision targeting capabilities
4. This Talk
• How UA uses Kafka
• Kafka architecture digest
• Data structures and stream processing w/ Kafka
• Operational considerations
6. Kafka at Urban Airship
“The use for activity stream processing makes Kafka comparable to Facebook's
Scribe or Apache Flume... though the architecture and primitives are very different
for these systems and make Kafka more comparable to a traditional messaging
system.”
- http://incubator.apache.org/kafka/ Sep 27, 2012
7. Kafka at Urban Airship
“The use for activity stream processing makes Kafka comparable to Facebook's
Scribe or Apache Flume... though the architecture and primitives are very different
for these systems and make Kafka more comparable to a traditional messaging
system.”
- http://incubator.apache.org/kafka/ Sep 27, 2012
“Let’s use it for all the things”
- me, 2010
9. Kafka at Urban Airship
• On the critical path for many of our core capabilities
10. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
11. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
12. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
• Device connectivity state
13. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
• Device connectivity state
• Feeds our operational data warehouse
14. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
• Device connectivity state
• Feeds our operational data warehouse
• Three Kafka clusters doing in aggregate > 7B msg/day
15. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
• Device connectivity state
• Feeds our operational data warehouse
• Three Kafka clusters doing in aggregate > 7B msg/day
• Peak capacity observed single consumer 750K msg/sec
16. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
• Device connectivity state
• Feeds our operational data warehouse
• Three Kafka clusters doing in aggregate > 7B msg/day
• Peak capacity observed single consumer 750K msg/sec
• All bare metal hardware hosted with an MSP
17. Kafka at Urban Airship
• On the critical path for many of our core capabilities
• Device metadata
• Message delivery analytics
• Device connectivity state
• Feeds our operational data warehouse
• Three Kafka clusters doing in aggregate > 7B msg/day
• Peak capacity observed single consumer 750K msg/sec
• All bare metal hardware hosted with an MSP
• Factoring prominently in our multi-facility architecture
21. Kafka Core Concepts
• Publish subscribe system (not a queue)
• One producer, zero or more consumers
• Consumers aren’t contending with each other for
messages
• Messages retained for a configured window of time
• Messages grouped by topics
• Consumers partition a topic as a group:
•1 consumer thread - all topic messages
•2 consumers threads - each .5 total messages
•3 consumers threads - each .3 total messages
23. Kafka Core Concepts - Producers
• Producers have no idea who will consume a message or
when
• Deliver messages to one and only one topic
• Deliver messages to one and only one broker*
• Deliver a message to one and only one partition on a
broker
• Messages are not ack’d in any way (not when received,
not when on disk, not on a boat, not in a plane...)
• Messages largely opaque to producers
• Send messages at or below a configured size†
25. Kafka Core Concepts - Brokers
• Dumb by design
• No shared state
• Publish small bits of metadata to ZooKeeper
• Messages are pulled by consumers (no push state
management)
• Manage sets of segment files, one per topic + partition
combination
• All delivery done through sendfile calls on mmap’d files
- very fast, avoids system -> user -> system copy for
every send
26. Kafka Core Concepts - Brokers
• Nearly invisible in the grand scheme of operations if
they have enough disk and RAM
27. Kafka Core Concepts - Brokers
• Don’t fear the JVM (just put it in a corner)
• Most of the heavy lifting is done in system calls
• Minimal on-heap buffering keeps most garbage in
ParNew
• 20 minute sample has approximately 100 ParNew
collections for a total of .42 seconds in GC
(0.0003247526)
29. Kafka Core Concepts - Consumers
• Consumer configured for one and only one group
• Messages are consumed in KafkaMessageStream
iterators that never stop but may block
• Message message stream is a combination of:
• Topic (SPORTS)
• Group (SPORTS EVENT LOGGER | SCORE UPDATER)
• Broker(s) - 1 or more brokers feed a logical stream
• Partition(s) - 1 or more partitions from a broker + topic
33. Kafka Is Excellent for...
• Small, expressive messages - BYOD
• Throughput
• Decimates any JMS or AMQP servers for PubSub
throughput
34. Kafka Is Excellent for...
• Small, expressive messages - BYOD
• Throughput
• Decimates any JMS or AMQP servers for PubSub
throughput
• >70x better throughput than beanstalkd
35. Kafka Is Excellent for...
• Small, expressive messages - BYOD
• Throughput
• Decimates any JMS or AMQP servers for PubSub
throughput
• >70x better throughput than beanstalkd
• Scales well with number of consumers, topics
36. Kafka Is Excellent for...
• Small, expressive messages - BYOD
• Throughput
• Decimates any JMS or AMQP servers for PubSub
throughput
• >70x better throughput than beanstalkd
• Scales well with number of consumers, topics
• Re-balance after consumer failures
37. Kafka Is Excellent for...
• Small, expressive messages - BYOD
• Throughput
• Decimates any JMS or AMQP servers for PubSub
throughput
• >70x better throughput than beanstalkd
• Scales well with number of consumers, topics
• Re-balance after consumer failures
• Rewind in time scenarios
38. Kafka Is Excellent for...
• Small, expressive messages - BYOD
• Throughput
• Decimates any JMS or AMQP servers for PubSub
throughput
• >70x better throughput than beanstalkd
• Scales well with number of consumers, topics
• Re-balance after consumer failures
• Rewind in time scenarios
• Allowing transient “taps” into streams of data for roughly
the cost of transport
40. But, Kafka Makes Critical Concessions - Brokers
• Data not redundant - if a broker dies, you have to
restore it to recover that data
41. But, Kafka Makes Critical Concessions - Brokers
• Data not redundant - if a broker dies, you have to
restore it to recover that data
• Shore up hardware
42. But, Kafka Makes Critical Concessions - Brokers
• Data not redundant - if a broker dies, you have to
restore it to recover that data
• Shore up hardware
• Consume as fast as possible
43. But, Kafka Makes Critical Concessions - Brokers
• Data not redundant - if a broker dies, you have to
restore it to recover that data
• Shore up hardware
• Consume as fast as possible
• Persist to shared storage or use BRDB
44. But, Kafka Makes Critical Concessions - Brokers
• Data not redundant - if a broker dies, you have to
restore it to recover that data
• Shore up hardware
• Consume as fast as possible
• Persist to shared storage or use BRDB
• Upcoming replication
45. But, Kafka Makes Critical Concessions - Brokers
• Data not redundant - if a broker dies, you have to
restore it to recover that data
• Shore up hardware
• Consume as fast as possible
• Persist to shared storage or use BRDB
• Upcoming replication
• Segment corruption can be fatal for that topic + partition
51. Kafka Critical Concessions - Consumers
• No once and only once semantics
• Consumers must correctly handle the same message
multiple times
52. Kafka Critical Concessions - Consumers
• No once and only once semantics
• Consumers must correctly handle the same message
multiple times
• Rebalance after fail can result in redelivery
53. Kafka Critical Concessions - Consumers
• No once and only once semantics
• Consumers must correctly handle the same message
multiple times
• Rebalance after fail can result in redelivery
• Consumer failure or unclean shutdown can result in
redelivery
54. Kafka Critical Concessions - Consumers
• No once and only once semantics
• Consumers must correctly handle the same message
multiple times
• Rebalance after fail can result in redelivery
• Consumer failure or unclean shutdown can result in
redelivery
• Possibility of out of order delivery and redelivery require
idempotent, commutative consumers when dealing with
systems of record
58. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
59. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
60. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
• Operational Data
61. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
• Operational Data
• Producers are services writing to Kafka
62. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
• Operational Data
• Producers are services writing to Kafka
• Consumers write to ODW (HBase as JSON)
63. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
• Operational Data
• Producers are services writing to Kafka
• Consumers write to ODW (HBase as JSON)
• Presence Data
64. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
• Operational Data
• Producers are services writing to Kafka
• Consumers write to ODW (HBase as JSON)
• Presence Data
• Producers are connectivity nodes writing to Kafka
65. Storage Patterns and Data Structures
• Urban Airship uses Kafka for
• Analytics
• Producers write device data to Kafka
• Consumers create dimensional indexes in HBase
• Operational Data
• Producers are services writing to Kafka
• Consumers write to ODW (HBase as JSON)
• Presence Data
• Producers are connectivity nodes writing to Kafka
• Consumers write to LevelDB
71. Storage Patterns - Device Metadata
• Primitive incarnation - blast an update into a row, keyed
on deviceID
72. Storage Patterns - Device Metadata
• Primitive incarnation - blast an update into a row, keyed
on deviceID
• RDBMS
73. Storage Patterns - Device Metadata
• Primitive incarnation - blast an update into a row, keyed
on deviceID
• RDBMS
• INSERT OR UPDATE DEVICE_METADATA (ID, VALUE)
VALUES (DEVICE_ID, BLOB) WHERE ID = deviceID;
74. Storage Patterns - Device Metadata
• Primitive incarnation - blast an update into a row, keyed
on deviceID
• RDBMS
• INSERT OR UPDATE DEVICE_METADATA (ID, VALUE)
VALUES (DEVICE_ID, BLOB) WHERE ID = deviceID;
• Denormalize - forget joining to read tags, way too
expensive
80. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=NULL -> v= BLOB
• Both
• Idempotent
• FAIL - mutations can arrive out of order, can be
replayed
81. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=NULL -> v= BLOB
• Both
• Idempotent
• FAIL - mutations can arrive out of order, can be
replayed
• Commutative
83. Storage Patterns - Device Metadata
• Improved approach - leverage the timestamp of the
mutation
84. Storage Patterns - Device Metadata
• Improved approach - leverage the timestamp of the
mutation
• RDBMS
85. Storage Patterns - Device Metadata
• Improved approach - leverage the timestamp of the
mutation
• RDBMS
• INSERT OR UPDATE DEVICE_METADATA (KEY, VALUE,
TS) VALUES (DEVICE_ID, BLOB, TS) WHERE ID =
deviceID AND TS = TS;
86. Storage Patterns - Device Metadata
• Improved approach - leverage the timestamp of the
mutation
• RDBMS
• INSERT OR UPDATE DEVICE_METADATA (KEY, VALUE,
TS) VALUES (DEVICE_ID, BLOB, TS) WHERE ID =
deviceID AND TS = TS;
• Heavy-handed approach
87. Storage Patterns - Device Metadata
• Improved approach - leverage the timestamp of the
mutation
• RDBMS
• INSERT OR UPDATE DEVICE_METADATA (KEY, VALUE,
TS) VALUES (DEVICE_ID, BLOB, TS) WHERE ID =
deviceID AND TS = TS;
• Heavy-handed approach
• Massive I/O on TS index or risk reading an entire
block per version with no adjacent blocks
96. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
97. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
• No transactions required, much smaller lock footprint
98. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
• No transactions required, much smaller lock footprint
• Both
99. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
• No transactions required, much smaller lock footprint
• Both
• Idempotent
100. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
• No transactions required, much smaller lock footprint
• Both
• Idempotent
• Commutative
101. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
• No transactions required, much smaller lock footprint
• Both
• Idempotent
• Commutative
• Old versions not removed automatically
102. Storage Patterns - Device Metadata
• Column Store
• Write k=deviceId -> c=INV(ts) -> v=BLOB
• Reads are simple slices of one column, easy for LSM
(pop the top column in the row)
• No transactions required, much smaller lock footprint
• Both
• Idempotent
• Commutative
• Old versions not removed automatically
• Secondary indexes very difficult
118. Operational Considerations - Buffering
•A message in a broker is not immediately visible to a
consumer
• Kafka buffers data until one of two conditions is true
119. Operational Considerations - Buffering
•A message in a broker is not immediately visible to a
consumer
• Kafka buffers data until one of two conditions is true
• log.flush.interval reached
120. Operational Considerations - Buffering
•A message in a broker is not immediately visible to a
consumer
• Kafka buffers data until one of two conditions is true
• log.flush.interval reached
• log.default.flush.interval.ms elapsed
121. Operational Considerations - Buffering
•A message in a broker is not immediately visible to a
consumer
• Kafka buffers data until one of two conditions is true
• log.flush.interval reached
• log.default.flush.interval.ms elapsed
• False latency for low throughput workloads
122. Operational Considerations - Buffering
•A message in a broker is not immediately visible to a
consumer
• Kafka buffers data until one of two conditions is true
• log.flush.interval reached
• log.default.flush.interval.ms elapsed
• False latency for low throughput workloads
• The smaller of the two represents loss message potential
124. Operational Considerations - The FetcherRunnable
• Consumer spawns a number of FetcherRunnable threads
to read from brokers
125. Operational Considerations - The FetcherRunnable
• Consumer spawns a number of FetcherRunnable threads
to read from brokers
• FetcherRunnable feeds messages into queues that back
the KafkaMessageStream API
126. Operational Considerations - The FetcherRunnable
• Consumer spawns a number of FetcherRunnable threads
to read from brokers
• FetcherRunnable feeds messages into queues that back
the KafkaMessageStream API
• FetchRunnable must remain healthy for consumers to see
messages
127. Operational Considerations - The FetcherRunnable
• Consumer spawns a number of FetcherRunnable threads
to read from brokers
• FetcherRunnable feeds messages into queues that back
the KafkaMessageStream API
• FetchRunnable must remain healthy for consumers to see
messages
// consume the messages in the threads
for(final KafkaStream<Message> stream: streams) {
executor.submit(new Runnable() {
public void run() {
for(MessageAndMetadata msgAndMetadata: stream) {
// process message (msgAndMetadata.message())
}}};}
130. Operational Considerations - The FetcherRunnable
•A given FetcherRunnable is the lone source of data for
its streams
131. Operational Considerations - The FetcherRunnable
•A given FetcherRunnable is the lone source of data for
its streams
• When a FetcherRunnable dies, the streams block
indefinitely
132. Operational Considerations - The FetcherRunnable
•A given FetcherRunnable is the lone source of data for
its streams
• When a FetcherRunnable dies, the streams block
indefinitely
2012-06-15 00:31:39,422 - ERROR [FetchRunnable-0:kafka.consumer.FetcherRunnable] - error in
FetcherRunnable
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:175)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at kafka.utils.Utils$.read(Utils.scala:483)
at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:53)
at kafka.network.Receive$class.readCompletely(Transmission.scala:56)
at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:28)
at kafka.consumer.SimpleConsumer.getResponse(SimpleConsumer.scala:181)
at kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:129)
at kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:119)
at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:63)
136. Operational Considerations - Rate is King
• MONITOR YOUR CONSUMPTION RATES
• Kafka JMX Beans
• Application metrics for specific consumption behaviors
(use Yammer Timer metrics)
137. Operational Considerations - Rate is King
• MONITOR YOUR CONSUMPTION RATES
• Kafka JMX Beans
• Application metrics for specific consumption behaviors
(use Yammer Timer metrics)
• Understand what “normal” is, alert when you are out of
that band by some tolerance
138. Operational Considerations - Rate is King
• MONITOR YOUR CONSUMPTION RATES
• Kafka JMX Beans
• Application metrics for specific consumption behaviors
(use Yammer Timer metrics)
• Understand what “normal” is, alert when you are out of
that band by some tolerance
• Not overcommitting consumers helps - nobody is idle
140. Operational Considerations - The Retention Window
• Data written to a segment file on a broker (topic +
partition)
141. Operational Considerations - The Retention Window
• Data written to a segment file on a broker (topic +
partition)
• Every consumer group has a relative offset within a
segment
142. Operational Considerations - The Retention Window
• Data written to a segment file on a broker (topic +
partition)
• Every consumer group has a relative offset within a
segment
• Individual consumers move the offset and store to
ZooKeeper on a regular interval
143. Operational Considerations - The Retention Window
• Data written to a segment file on a broker (topic +
partition)
• Every consumer group has a relative offset within a
segment
• Individual consumers move the offset and store to
ZooKeeper on a regular interval
• Segments are retained for log.retention.hours
144. Operational Considerations - The Retention Window
• Data written to a segment file on a broker (topic +
partition)
• Every consumer group has a relative offset within a
segment
• Individual consumers move the offset and store to
ZooKeeper on a regular interval
• Segments are retained for log.retention.hours
• Segments deleted when outside retention window
151. Operational Considerations - The Retention Window
• Consumers update offsets in ZooKeeper
• Monitor them and make sure they’re progressing
152. Operational Considerations - The Retention Window
• Consumers update offsets in ZooKeeper
• Monitor them and make sure they’re progressing
• Look for skew in rate of change between partition offsets
153. Operational Considerations - The Retention Window
• Consumers update offsets in ZooKeeper
• Monitor them and make sure they’re progressing
• Look for skew in rate of change between partition offsets
• Monitoring consumption rate can also help
155. Operational Considerations - Scala
2012-07-04 11:49:08,469 - WARN [ZkClient-EventThread-132-
zookeeper-0:2181,zookeeper-1:2181,zookeeper-2:2181:org.I0Itec.zkclient.ZkEventThread] - Error handling event ZkEvent[Children of /
brokers/topics/SEND_EVENTS changed sent to kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener@43d248b4]
java.lang.NullPointerException
at scala.util.parsing.combinator.Parsers$NoSuccess.<init>(Parsers.scala:131)
at scala.util.parsing.combinator.Parsers$Failure.<init>(Parsers.scala:158)
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:489)
at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:487)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
... (~50 lines elided)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(Parsers.scala:113)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:208)
at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:208)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)
at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:742)
at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)
at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
158. Operational Considerations - Brokers
• Monitor IOPS and IOUtil
• Under no circumstances allow a broker to run out of disk
space (don’t even get close)
159. Operational Considerations - Brokers
• Monitor IOPS and IOUtil
• Under no circumstances allow a broker to run out of disk
space (don’t even get close)
• fetch.size - amount of data a consumer will pull
160. Operational Considerations - Brokers
• Monitor IOPS and IOUtil
• Under no circumstances allow a broker to run out of disk
space (don’t even get close)
• fetch.size - amount of data a consumer will pull
• max.message.size - largest message a producer can
submit to a broker
161. Operational Considerations - Brokers
• Monitor IOPS and IOUtil
• Under no circumstances allow a broker to run out of disk
space (don’t even get close)
• fetch.size - amount of data a consumer will pull
• max.message.size - largest message a producer can
submit to a broker
• Broker enforces neither of these prior to v0.8 :(
162. Operational Considerations - Brokers
• Monitor IOPS and IOUtil
• Under no circumstances allow a broker to run out of disk
space (don’t even get close)
• fetch.size - amount of data a consumer will pull
• max.message.size - largest message a producer can
submit to a broker
• Broker enforces neither of these prior to v0.8 :(
• KAFKA-490
163. Operational Considerations - Brokers
• Monitor IOPS and IOUtil
• Under no circumstances allow a broker to run out of disk
space (don’t even get close)
• fetch.size - amount of data a consumer will pull
• max.message.size - largest message a producer can
submit to a broker
• Broker enforces neither of these prior to v0.8 :(
• KAFKA-490
• KAFKA-247
165. Operational Considerations - Brokers
2012-06-15 04:47:35,632 - ERROR [FetchRunnable-2:kafka.consumer.FetcherRunnable] - error in
FetcherRunnable for RN-OL:3-22
kafka.common.InvalidMessageSizeException: invalid message size:152173251 only received bytes:307196
at 0 possible causes (1) a single message larger than the fetch size; (2) log corruption
at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:75)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:61)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:58)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:50)
at kafka.message.ByteBufferMessageSet.validBytes(ByteBufferMessageSet.scala:49)
at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:70)
at kafka.consumer.FetcherRunnable$$anonfun$run$3.apply(FetcherRunnable.scala:80)
at kafka.consumer.FetcherRunnable$$anonfun$run$3.apply(FetcherRunnable.scala:66)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
at scala.collection.immutable.List.foreach(List.scala:45)
at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:66)
169. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
170. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
• Excessive rebalancing
171. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
• Excessive rebalancing
• Under commit - less threads than partitions
172. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
• Excessive rebalancing
• Under commit - less threads than partitions
• Serial fetchers won’t keep up depending on workload
173. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
• Excessive rebalancing
• Under commit - less threads than partitions
• Serial fetchers won’t keep up depending on workload
• Big GCs can cause rebalancing
174. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
• Excessive rebalancing
• Under commit - less threads than partitions
• Serial fetchers won’t keep up depending on workload
• Big GCs can cause rebalancing
• Just right - 2 partitions / consumer thread ratio
175. Operational Considerations - Consumers
• Consumer tuning is an art
• Overcommit - more threads than partitions
• Idling (often entire consumer processes)
• Excessive rebalancing
• Under commit - less threads than partitions
• Serial fetchers won’t keep up depending on workload
• Big GCs can cause rebalancing
• Just right - 2 partitions / consumer thread ratio
• Mostly pivots on consumer workload (i.e. latency)
178. Operational Considerations - Incubators Gonna
Incubate
• Deployed in some large installations
• Largely learning in production
179. Operational Considerations - Incubators Gonna
Incubate
• Deployed in some large installations
• Largely learning in production
• Hasn’t lived through a long lineage of people being
mean to it or using in anger
180. Operational Considerations - Incubators Gonna
Incubate
• Deployed in some large installations
• Largely learning in production
• Hasn’t lived through a long lineage of people being
mean to it or using in anger
2012-06-15 04:25:00,774 - ERROR [kafka-processor-3:Processor@215] - java.lang.RuntimeException:
OOME with size 1195725856
java.lang.RuntimeException: OOME with size 1195725856
at kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:81)
at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:60)
at kafka.network.Processor.read(SocketServer.scala:283)
at kafka.network.Processor.run(SocketServer.scala:202)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:77)
184. Operational Considerations - Incubators Gonna
Incubate
• With any incubator project, assume it will be rough
around the edges
185. Operational Considerations - Incubators Gonna
Incubate
• With any incubator project, assume it will be rough
around the edges
• Assume that if you point your monitoring agent at the
service port, things will break
186. Operational Considerations - Incubators Gonna
Incubate
• With any incubator project, assume it will be rough
around the edges
• Assume that if you point your monitoring agent at the
service port, things will break
• As a general practice, measure the intended outcome of
production changes
187. Acknowledgements
The storage models proposed were inspired and adapted
by:
http://engineering.twitter.com/2010/05/introducing-
flockdb.html
https://github.com/mochi/statebox
188. Q&A
We’re hiring!
• Infrastructure
• Django
• Operations
Contact:
erik@urbanairship.com (that I put my email in slides is not
an invitation to sell me software so don’t do that)
@eonnen - twitter
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
At any time, a stream may be reading from 4 partitions on 3 brokers\n
Bring your own data\n
Bring your own data\n
Bring your own data\n
Bring your own data\n
Bring your own data\n
Bring your own data\n
Bring your own data\n
Bring your own data\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
could use ttl for removing old versions but set to what?\n
\n
\n
\n
\n
\n
\n
\n
\n
Next OPS\n
Next OPS\n
Next OPS\n
Interval == number of messages\nMessage loss potential in the event of hard process fail\n
Interval == number of messages\nMessage loss potential in the event of hard process fail\n
Interval == number of messages\nMessage loss potential in the event of hard process fail\n
Interval == number of messages\nMessage loss potential in the event of hard process fail\n
Interval == number of messages\nMessage loss potential in the event of hard process fail\n
Interval == number of messages\nMessage loss potential in the event of hard process fail\n
\n
\n
\n
\n
When the runnable dies, the consumers will idle\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
145MB vs. 300K\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
Balancing feedback loop herd and consumer GC loses ZK lease\n
\n
\n
\n
\n
1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n