• Save
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream Processing
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream Processing

on

  • 3,262 views

Slides from Surge 2012

Slides from Surge 2012

Statistics

Views

Total Views
3,262
Views on SlideShare
3,116
Embed Views
146

Actions

Likes
9
Downloads
0
Comments
1

7 Embeds 146

https://twitter.com 111
http://lanyrd.com 23
http://www.slashdocs.com 4
http://www.docshut.com 3
https://si0.twimg.com 2
http://feeds.feedburner.com 2
http://twitter.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • At any time, a stream may be reading from 4 partitions on 3 brokers\n
  • Bring your own data\n
  • Bring your own data\n
  • Bring your own data\n
  • Bring your own data\n
  • Bring your own data\n
  • Bring your own data\n
  • Bring your own data\n
  • Bring your own data\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • could use ttl for removing old versions but set to what?\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Next OPS\n
  • Next OPS\n
  • Next OPS\n
  • Interval == number of messages\nMessage loss potential in the event of hard process fail\n
  • Interval == number of messages\nMessage loss potential in the event of hard process fail\n
  • Interval == number of messages\nMessage loss potential in the event of hard process fail\n
  • Interval == number of messages\nMessage loss potential in the event of hard process fail\n
  • Interval == number of messages\nMessage loss potential in the event of hard process fail\n
  • Interval == number of messages\nMessage loss potential in the event of hard process fail\n
  • \n
  • \n
  • \n
  • \n
  • When the runnable dies, the consumers will idle\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 145MB vs. 300K\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • Balancing feedback loop herd and consumer GC loses ZK lease\n
  • \n
  • \n
  • \n
  • \n
  • 1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
  • 1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
  • 1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
  • 1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
  • 1195725856 was the beginning of a GET /all request to what should have been our monitoring port\n
  • \n
  • \n

Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream Processing Presentation Transcript

  • 1. Data Models and ConsumerIdioms Using Apache Kafka forContinuous Data StreamProcessingSurge’12September 27, 2012Erik Onnen@eonnen
  • 2. About Me• Director of Architecture and Development at Urban Airship• Formerly Jive Software, Liberty Mutual, Opsware, Progress• Java, C++, Python• Background in messaging systems • Contributor to ActiveMQ • Global Tibco deployments • ESB Commercial Products
  • 3. About Urban Airship• Engagement platform using location and push notifications• Analytics for delivery, conversion and influence• High precision targeting capabilities
  • 4. This Talk• How UA uses Kafka• Kafka architecture digest• Data structures and stream processing w/ Kafka• Operational considerations
  • 5. Kafka at Urban Airship
  • 6. Kafka at Urban Airship“The use for activity stream processing makes Kafka comparable to FacebooksScribe or Apache Flume... though the architecture and primitives are very differentfor these systems and make Kafka more comparable to a traditional messagingsystem.”- http://incubator.apache.org/kafka/ Sep 27, 2012
  • 7. Kafka at Urban Airship“The use for activity stream processing makes Kafka comparable to FacebooksScribe or Apache Flume... though the architecture and primitives are very differentfor these systems and make Kafka more comparable to a traditional messagingsystem.”- http://incubator.apache.org/kafka/ Sep 27, 2012“Let’s use it for all the things”- me, 2010
  • 8. Kafka at Urban Airship
  • 9. Kafka at Urban Airship• On the critical path for many of our core capabilities
  • 10. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata
  • 11. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics
  • 12. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics • Device connectivity state
  • 13. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics • Device connectivity state • Feeds our operational data warehouse
  • 14. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics • Device connectivity state • Feeds our operational data warehouse• Three Kafka clusters doing in aggregate > 7B msg/day
  • 15. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics • Device connectivity state • Feeds our operational data warehouse• Three Kafka clusters doing in aggregate > 7B msg/day• Peak capacity observed single consumer 750K msg/sec
  • 16. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics • Device connectivity state • Feeds our operational data warehouse• Three Kafka clusters doing in aggregate > 7B msg/day• Peak capacity observed single consumer 750K msg/sec• All bare metal hardware hosted with an MSP
  • 17. Kafka at Urban Airship• On the critical path for many of our core capabilities • Device metadata • Message delivery analytics • Device connectivity state • Feeds our operational data warehouse• Three Kafka clusters doing in aggregate > 7B msg/day• Peak capacity observed single consumer 750K msg/sec• All bare metal hardware hosted with an MSP• Factoring prominently in our multi-facility architecture
  • 18. Kafka Core Concepts - The Big Picture
  • 19. Kafka Core Concepts - The Big Picture
  • 20. Kafka Core Concepts
  • 21. Kafka Core Concepts• Publish subscribe system (not a queue)• One producer, zero or more consumers• Consumers aren’t contending with each other for messages• Messages retained for a configured window of time• Messages grouped by topics• Consumers partition a topic as a group: •1 consumer thread - all topic messages •2 consumers threads - each .5 total messages •3 consumers threads - each .3 total messages
  • 22. Kafka Core Concepts - Producers
  • 23. Kafka Core Concepts - Producers• Producers have no idea who will consume a message or when• Deliver messages to one and only one topic• Deliver messages to one and only one broker*• Deliver a message to one and only one partition on a broker• Messages are not ack’d in any way (not when received, not when on disk, not on a boat, not in a plane...)• Messages largely opaque to producers• Send messages at or below a configured size†
  • 24. Kafka Core Concepts - Brokers
  • 25. Kafka Core Concepts - Brokers• Dumb by design • No shared state • Publish small bits of metadata to ZooKeeper • Messages are pulled by consumers (no push state management)• Manage sets of segment files, one per topic + partition combination• All delivery done through sendfile calls on mmap’d files - very fast, avoids system -> user -> system copy for every send
  • 26. Kafka Core Concepts - Brokers• Nearly invisible in the grand scheme of operations if they have enough disk and RAM
  • 27. Kafka Core Concepts - Brokers• Don’t fear the JVM (just put it in a corner) • Most of the heavy lifting is done in system calls • Minimal on-heap buffering keeps most garbage in ParNew • 20 minute sample has approximately 100 ParNew collections for a total of .42 seconds in GC (0.0003247526)
  • 28. Kafka Core Concepts - Consumers
  • 29. Kafka Core Concepts - Consumers• Consumer configured for one and only one group• Messages are consumed in KafkaMessageStream iterators that never stop but may block• Message message stream is a combination of: • Topic (SPORTS) • Group (SPORTS EVENT LOGGER | SCORE UPDATER) • Broker(s) - 1 or more brokers feed a logical stream • Partition(s) - 1 or more partitions from a broker + topic
  • 30. Kafka Is Excellent for...
  • 31. Kafka Is Excellent for...• Small, expressive messages - BYOD
  • 32. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput
  • 33. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput • Decimates any JMS or AMQP servers for PubSub throughput
  • 34. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput • Decimates any JMS or AMQP servers for PubSub throughput • >70x better throughput than beanstalkd
  • 35. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput • Decimates any JMS or AMQP servers for PubSub throughput • >70x better throughput than beanstalkd • Scales well with number of consumers, topics
  • 36. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput • Decimates any JMS or AMQP servers for PubSub throughput • >70x better throughput than beanstalkd • Scales well with number of consumers, topics • Re-balance after consumer failures
  • 37. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput • Decimates any JMS or AMQP servers for PubSub throughput • >70x better throughput than beanstalkd • Scales well with number of consumers, topics • Re-balance after consumer failures• Rewind in time scenarios
  • 38. Kafka Is Excellent for...• Small, expressive messages - BYOD• Throughput • Decimates any JMS or AMQP servers for PubSub throughput • >70x better throughput than beanstalkd • Scales well with number of consumers, topics • Re-balance after consumer failures• Rewind in time scenarios• Allowing transient “taps” into streams of data for roughly the cost of transport
  • 39. But, Kafka Makes Critical Concessions - Brokers
  • 40. But, Kafka Makes Critical Concessions - Brokers• Data not redundant - if a broker dies, you have to restore it to recover that data
  • 41. But, Kafka Makes Critical Concessions - Brokers• Data not redundant - if a broker dies, you have to restore it to recover that data • Shore up hardware
  • 42. But, Kafka Makes Critical Concessions - Brokers• Data not redundant - if a broker dies, you have to restore it to recover that data • Shore up hardware • Consume as fast as possible
  • 43. But, Kafka Makes Critical Concessions - Brokers• Data not redundant - if a broker dies, you have to restore it to recover that data • Shore up hardware • Consume as fast as possible • Persist to shared storage or use BRDB
  • 44. But, Kafka Makes Critical Concessions - Brokers• Data not redundant - if a broker dies, you have to restore it to recover that data • Shore up hardware • Consume as fast as possible • Persist to shared storage or use BRDB • Upcoming replication
  • 45. But, Kafka Makes Critical Concessions - Brokers• Data not redundant - if a broker dies, you have to restore it to recover that data • Shore up hardware • Consume as fast as possible • Persist to shared storage or use BRDB • Upcoming replication• Segment corruption can be fatal for that topic + partition
  • 46. Kafka Critical Concessions - Consumers
  • 47. Kafka Critical Concessions - Consumers• Messages can be delivered out of order
  • 48. Kafka Critical Concessions - Consumers• Messages can be delivered out of order
  • 49. Kafka Critical Concessions - Consumers
  • 50. Kafka Critical Concessions - Consumers• No once and only once semantics
  • 51. Kafka Critical Concessions - Consumers• No once and only once semantics• Consumers must correctly handle the same message multiple times
  • 52. Kafka Critical Concessions - Consumers• No once and only once semantics• Consumers must correctly handle the same message multiple times • Rebalance after fail can result in redelivery
  • 53. Kafka Critical Concessions - Consumers• No once and only once semantics• Consumers must correctly handle the same message multiple times • Rebalance after fail can result in redelivery • Consumer failure or unclean shutdown can result in redelivery
  • 54. Kafka Critical Concessions - Consumers• No once and only once semantics• Consumers must correctly handle the same message multiple times • Rebalance after fail can result in redelivery • Consumer failure or unclean shutdown can result in redelivery• Possibility of out of order delivery and redelivery require idempotent, commutative consumers when dealing with systems of record
  • 55. Storage Patterns and Data Structures
  • 56. Storage Patterns and Data Structures• Urban Airship uses Kafka for
  • 57. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics
  • 58. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka
  • 59. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase
  • 60. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase • Operational Data
  • 61. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase • Operational Data • Producers are services writing to Kafka
  • 62. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase • Operational Data • Producers are services writing to Kafka • Consumers write to ODW (HBase as JSON)
  • 63. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase • Operational Data • Producers are services writing to Kafka • Consumers write to ODW (HBase as JSON) • Presence Data
  • 64. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase • Operational Data • Producers are services writing to Kafka • Consumers write to ODW (HBase as JSON) • Presence Data • Producers are connectivity nodes writing to Kafka
  • 65. Storage Patterns and Data Structures• Urban Airship uses Kafka for • Analytics • Producers write device data to Kafka • Consumers create dimensional indexes in HBase • Operational Data • Producers are services writing to Kafka • Consumers write to ODW (HBase as JSON) • Presence Data • Producers are connectivity nodes writing to Kafka • Consumers write to LevelDB
  • 66. Storage Patterns - Device Metadata
  • 67. Storage Patterns - Device Metadata{ deviceId:”PONIES”, tags:[”BEYONCE”], timestamp:1}
  • 68. Storage Patterns - Device Metadata{ deviceId:”PONIES”, tags:[”BEYONCE”], timestamp:1}{ deviceId:”PONIES”, tags:[”BEYONCE”, “JAY-Z”, “NICKLEBACK”],timestamp:2}
  • 69. Storage Patterns - Device Metadata{ deviceId:”PONIES”, tags:[”BEYONCE”], timestamp:1}{ deviceId:”PONIES”, tags:[”BEYONCE”, “JAY-Z”, “NICKLEBACK”],timestamp:2}{ deviceId:”PONIES”, tags:[”BEYONCE”, “JAY-Z”, “NICKLEBACK”],timestamp:3}
  • 70. Storage Patterns - Device Metadata
  • 71. Storage Patterns - Device Metadata• Primitive incarnation - blast an update into a row, keyed on deviceID
  • 72. Storage Patterns - Device Metadata• Primitive incarnation - blast an update into a row, keyed on deviceID • RDBMS
  • 73. Storage Patterns - Device Metadata• Primitive incarnation - blast an update into a row, keyed on deviceID • RDBMS • INSERT OR UPDATE DEVICE_METADATA (ID, VALUE) VALUES (DEVICE_ID, BLOB) WHERE ID = deviceID;
  • 74. Storage Patterns - Device Metadata• Primitive incarnation - blast an update into a row, keyed on deviceID • RDBMS • INSERT OR UPDATE DEVICE_METADATA (ID, VALUE) VALUES (DEVICE_ID, BLOB) WHERE ID = deviceID; • Denormalize - forget joining to read tags, way too expensive
  • 75. Storage Patterns - Device Metadata
  • 76. Storage Patterns - Device Metadata • Column Store
  • 77. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=NULL -> v= BLOB
  • 78. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=NULL -> v= BLOB • Both
  • 79. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=NULL -> v= BLOB • Both • Idempotent
  • 80. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=NULL -> v= BLOB • Both • Idempotent • FAIL - mutations can arrive out of order, can be replayed
  • 81. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=NULL -> v= BLOB • Both • Idempotent • FAIL - mutations can arrive out of order, can be replayed • Commutative
  • 82. Storage Patterns - Device Metadata
  • 83. Storage Patterns - Device Metadata• Improved approach - leverage the timestamp of the mutation
  • 84. Storage Patterns - Device Metadata• Improved approach - leverage the timestamp of the mutation • RDBMS
  • 85. Storage Patterns - Device Metadata• Improved approach - leverage the timestamp of the mutation • RDBMS • INSERT OR UPDATE DEVICE_METADATA (KEY, VALUE, TS) VALUES (DEVICE_ID, BLOB, TS) WHERE ID = deviceID AND TS = TS;
  • 86. Storage Patterns - Device Metadata• Improved approach - leverage the timestamp of the mutation • RDBMS • INSERT OR UPDATE DEVICE_METADATA (KEY, VALUE, TS) VALUES (DEVICE_ID, BLOB, TS) WHERE ID = deviceID AND TS = TS; • Heavy-handed approach
  • 87. Storage Patterns - Device Metadata• Improved approach - leverage the timestamp of the mutation • RDBMS • INSERT OR UPDATE DEVICE_METADATA (KEY, VALUE, TS) VALUES (DEVICE_ID, BLOB, TS) WHERE ID = deviceID AND TS = TS; • Heavy-handed approach • Massive I/O on TS index or risk reading an entire block per version with no adjacent blocks
  • 88. Storage Patterns - Device Metadata
  • 89. Storage Patterns - Device Metadata • Column Store
  • 90. Storage Patterns - Device Metadata • Column Store
  • 91. Storage Patterns - Device Metadata • Column Store
  • 92. Storage Patterns - Device Metadata • Column Store
  • 93. Storage Patterns - Device Metadata
  • 94. Storage Patterns - Device Metadata • Column Store
  • 95. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB
  • 96. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row)
  • 97. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row) • No transactions required, much smaller lock footprint
  • 98. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row) • No transactions required, much smaller lock footprint • Both
  • 99. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row) • No transactions required, much smaller lock footprint • Both • Idempotent
  • 100. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row) • No transactions required, much smaller lock footprint • Both • Idempotent • Commutative
  • 101. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row) • No transactions required, much smaller lock footprint • Both • Idempotent • Commutative • Old versions not removed automatically
  • 102. Storage Patterns - Device Metadata • Column Store • Write k=deviceId -> c=INV(ts) -> v=BLOB • Reads are simple slices of one column, easy for LSM (pop the top column in the row) • No transactions required, much smaller lock footprint • Both • Idempotent • Commutative • Old versions not removed automatically • Secondary indexes very difficult
  • 103. Storage Patterns - Device Metadata
  • 104. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned
  • 105. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns?
  • 106. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns? • Column Store
  • 107. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns? • Column Store • Addition k=deviceId -> c=TAG -> v=TS
  • 108. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns? • Column Store • Addition k=deviceId -> c=TAG -> v=TS • Deletion k=deviceId -> c=TAG -> v=-(TS)
  • 109. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns? • Column Store • Addition k=deviceId -> c=TAG -> v=TS • Deletion k=deviceId -> c=TAG -> v=-(TS) • Cell timestamp set to event timestamp in both cases (old updates ignored)
  • 110. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns? • Column Store • Addition k=deviceId -> c=TAG -> v=TS • Deletion k=deviceId -> c=TAG -> v=-(TS) • Cell timestamp set to event timestamp in both cases (old updates ignored) • Easy to (re)build secondary indexes, tag counts
  • 111. Storage Patterns - Device Metadata• Gangam Style - tag per column, deletions tombstoned • RDBMS - select for update and/or big txns? • Column Store • Addition k=deviceId -> c=TAG -> v=TS • Deletion k=deviceId -> c=TAG -> v=-(TS) • Cell timestamp set to event timestamp in both cases (old updates ignored) • Easy to (re)build secondary indexes, tag counts • Commutative, Idempotent and Fast
  • 112. Storage Patterns - Device Metadata
  • 113. Storage Patterns - Device Metadata
  • 114. Storage Patterns - Device Metadata
  • 115. Storage Patterns - Device Metadata
  • 116. Operational Considerations - Buffering
  • 117. Operational Considerations - Buffering•A message in a broker is not immediately visible to a consumer
  • 118. Operational Considerations - Buffering•A message in a broker is not immediately visible to a consumer• Kafka buffers data until one of two conditions is true
  • 119. Operational Considerations - Buffering•A message in a broker is not immediately visible to a consumer• Kafka buffers data until one of two conditions is true • log.flush.interval reached
  • 120. Operational Considerations - Buffering•A message in a broker is not immediately visible to a consumer• Kafka buffers data until one of two conditions is true • log.flush.interval reached • log.default.flush.interval.ms elapsed
  • 121. Operational Considerations - Buffering•A message in a broker is not immediately visible to a consumer• Kafka buffers data until one of two conditions is true • log.flush.interval reached • log.default.flush.interval.ms elapsed• False latency for low throughput workloads
  • 122. Operational Considerations - Buffering•A message in a broker is not immediately visible to a consumer• Kafka buffers data until one of two conditions is true • log.flush.interval reached • log.default.flush.interval.ms elapsed• False latency for low throughput workloads• The smaller of the two represents loss message potential
  • 123. Operational Considerations - The FetcherRunnable
  • 124. Operational Considerations - The FetcherRunnable• Consumer spawns a number of FetcherRunnable threads to read from brokers
  • 125. Operational Considerations - The FetcherRunnable• Consumer spawns a number of FetcherRunnable threads to read from brokers• FetcherRunnable feeds messages into queues that back the KafkaMessageStream API
  • 126. Operational Considerations - The FetcherRunnable• Consumer spawns a number of FetcherRunnable threads to read from brokers• FetcherRunnable feeds messages into queues that back the KafkaMessageStream API• FetchRunnable must remain healthy for consumers to see messages
  • 127. Operational Considerations - The FetcherRunnable• Consumer spawns a number of FetcherRunnable threads to read from brokers• FetcherRunnable feeds messages into queues that back the KafkaMessageStream API• FetchRunnable must remain healthy for consumers to see messages // consume the messages in the threads for(final KafkaStream<Message> stream: streams) { executor.submit(new Runnable() { public void run() { for(MessageAndMetadata msgAndMetadata: stream) { // process message (msgAndMetadata.message()) }}};}
  • 128. Operational Considerations - The FetcherRunnable
  • 129. Operational Considerations - The FetcherRunnable
  • 130. Operational Considerations - The FetcherRunnable•A given FetcherRunnable is the lone source of data for its streams
  • 131. Operational Considerations - The FetcherRunnable•A given FetcherRunnable is the lone source of data for its streams• When a FetcherRunnable dies, the streams block indefinitely
  • 132. Operational Considerations - The FetcherRunnable•A given FetcherRunnable is the lone source of data for its streams• When a FetcherRunnable dies, the streams block indefinitely2012-06-15 00:31:39,422 - ERROR [FetchRunnable-0:kafka.consumer.FetcherRunnable] - error inFetcherRunnablejava.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at kafka.utils.Utils$.read(Utils.scala:483) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:53) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely(BoundedByteBufferReceive.scala:28) at kafka.consumer.SimpleConsumer.getResponse(SimpleConsumer.scala:181) at kafka.consumer.SimpleConsumer.liftedTree2$1(SimpleConsumer.scala:129) at kafka.consumer.SimpleConsumer.multifetch(SimpleConsumer.scala:119) at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:63)
  • 133. Operational Considerations - Rate is King
  • 134. Operational Considerations - Rate is King• MONITOR YOUR CONSUMPTION RATES
  • 135. Operational Considerations - Rate is King• MONITOR YOUR CONSUMPTION RATES • Kafka JMX Beans
  • 136. Operational Considerations - Rate is King• MONITOR YOUR CONSUMPTION RATES • Kafka JMX Beans • Application metrics for specific consumption behaviors (use Yammer Timer metrics)
  • 137. Operational Considerations - Rate is King• MONITOR YOUR CONSUMPTION RATES • Kafka JMX Beans • Application metrics for specific consumption behaviors (use Yammer Timer metrics)• Understand what “normal” is, alert when you are out of that band by some tolerance
  • 138. Operational Considerations - Rate is King• MONITOR YOUR CONSUMPTION RATES • Kafka JMX Beans • Application metrics for specific consumption behaviors (use Yammer Timer metrics)• Understand what “normal” is, alert when you are out of that band by some tolerance• Not overcommitting consumers helps - nobody is idle
  • 139. Operational Considerations - The Retention Window
  • 140. Operational Considerations - The Retention Window• Data written to a segment file on a broker (topic + partition)
  • 141. Operational Considerations - The Retention Window• Data written to a segment file on a broker (topic + partition)• Every consumer group has a relative offset within a segment
  • 142. Operational Considerations - The Retention Window• Data written to a segment file on a broker (topic + partition)• Every consumer group has a relative offset within a segment• Individual consumers move the offset and store to ZooKeeper on a regular interval
  • 143. Operational Considerations - The Retention Window• Data written to a segment file on a broker (topic + partition)• Every consumer group has a relative offset within a segment• Individual consumers move the offset and store to ZooKeeper on a regular interval• Segments are retained for log.retention.hours
  • 144. Operational Considerations - The Retention Window• Data written to a segment file on a broker (topic + partition)• Every consumer group has a relative offset within a segment• Individual consumers move the offset and store to ZooKeeper on a regular interval• Segments are retained for log.retention.hours• Segments deleted when outside retention window
  • 145. Operational Considerations - The Retention Window
  • 146. Operational Considerations - The Retention Window
  • 147. Operational Considerations - The Retention Window
  • 148. Operational Considerations - The Retention Window
  • 149. Operational Considerations - The Retention Window
  • 150. Operational Considerations - The Retention Window• Consumers update offsets in ZooKeeper
  • 151. Operational Considerations - The Retention Window• Consumers update offsets in ZooKeeper• Monitor them and make sure they’re progressing
  • 152. Operational Considerations - The Retention Window• Consumers update offsets in ZooKeeper• Monitor them and make sure they’re progressing• Look for skew in rate of change between partition offsets
  • 153. Operational Considerations - The Retention Window• Consumers update offsets in ZooKeeper• Monitor them and make sure they’re progressing• Look for skew in rate of change between partition offsets• Monitoring consumption rate can also help
  • 154. Operational Considerations - Scala“Reading that Scala stack tracesure was easy”- Nobody Ever
  • 155. Operational Considerations - Scala2012-07-04 11:49:08,469 - WARN [ZkClient-EventThread-132-zookeeper-0:2181,zookeeper-1:2181,zookeeper-2:2181:org.I0Itec.zkclient.ZkEventThread] - Error handling event ZkEvent[Children of /brokers/topics/SEND_EVENTS changed sent to kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener@43d248b4]java.lang.NullPointerException    at scala.util.parsing.combinator.Parsers$NoSuccess.<init>(Parsers.scala:131)    at scala.util.parsing.combinator.Parsers$Failure.<init>(Parsers.scala:158)    at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:489)    at scala.util.parsing.combinator.Parsers$$anonfun$acceptIf$1.apply(Parsers.scala:487)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)... (~50 lines elided)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$Success.flatMapWithNext(Parsers.scala:113)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$flatMap$1.apply(Parsers.scala:200)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:203)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:208)    at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:208)    at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:182)    at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:742)    at scala.util.parsing.json.JSON$.parseRaw(JSON.scala:71)    at scala.util.parsing.json.JSON$.parseFull(JSON.scala:85)
  • 156. Operational Considerations - Brokers
  • 157. Operational Considerations - Brokers• Monitor IOPS and IOUtil
  • 158. Operational Considerations - Brokers• Monitor IOPS and IOUtil• Under no circumstances allow a broker to run out of disk space (don’t even get close)
  • 159. Operational Considerations - Brokers• Monitor IOPS and IOUtil• Under no circumstances allow a broker to run out of disk space (don’t even get close)• fetch.size - amount of data a consumer will pull
  • 160. Operational Considerations - Brokers• Monitor IOPS and IOUtil• Under no circumstances allow a broker to run out of disk space (don’t even get close)• fetch.size - amount of data a consumer will pull• max.message.size - largest message a producer can submit to a broker
  • 161. Operational Considerations - Brokers• Monitor IOPS and IOUtil• Under no circumstances allow a broker to run out of disk space (don’t even get close)• fetch.size - amount of data a consumer will pull• max.message.size - largest message a producer can submit to a broker• Broker enforces neither of these prior to v0.8 :(
  • 162. Operational Considerations - Brokers• Monitor IOPS and IOUtil• Under no circumstances allow a broker to run out of disk space (don’t even get close)• fetch.size - amount of data a consumer will pull• max.message.size - largest message a producer can submit to a broker• Broker enforces neither of these prior to v0.8 :( • KAFKA-490
  • 163. Operational Considerations - Brokers• Monitor IOPS and IOUtil• Under no circumstances allow a broker to run out of disk space (don’t even get close)• fetch.size - amount of data a consumer will pull• max.message.size - largest message a producer can submit to a broker• Broker enforces neither of these prior to v0.8 :( • KAFKA-490 • KAFKA-247
  • 164. Operational Considerations - Brokers
  • 165. Operational Considerations - Brokers2012-06-15 04:47:35,632 - ERROR [FetchRunnable-2:kafka.consumer.FetcherRunnable] - error inFetcherRunnable for RN-OL:3-22kafka.common.InvalidMessageSizeException: invalid message size:152173251 only received bytes:307196at 0 possible causes (1) a single message larger than the fetch size; (2) log corruption at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:75) at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:61) at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:58) at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:50) at kafka.message.ByteBufferMessageSet.validBytes(ByteBufferMessageSet.scala:49) at kafka.consumer.PartitionTopicInfo.enqueue(PartitionTopicInfo.scala:70) at kafka.consumer.FetcherRunnable$$anonfun$run$3.apply(FetcherRunnable.scala:80) at kafka.consumer.FetcherRunnable$$anonfun$run$3.apply(FetcherRunnable.scala:66) at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61) at scala.collection.immutable.List.foreach(List.scala:45) at kafka.consumer.FetcherRunnable.run(FetcherRunnable.scala:66)
  • 166. Operational Considerations - Consumers
  • 167. Operational Considerations - Consumers• Consumer tuning is an art
  • 168. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions
  • 169. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes)
  • 170. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes) • Excessive rebalancing
  • 171. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes) • Excessive rebalancing • Under commit - less threads than partitions
  • 172. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes) • Excessive rebalancing • Under commit - less threads than partitions • Serial fetchers won’t keep up depending on workload
  • 173. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes) • Excessive rebalancing • Under commit - less threads than partitions • Serial fetchers won’t keep up depending on workload • Big GCs can cause rebalancing
  • 174. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes) • Excessive rebalancing • Under commit - less threads than partitions • Serial fetchers won’t keep up depending on workload • Big GCs can cause rebalancing • Just right - 2 partitions / consumer thread ratio
  • 175. Operational Considerations - Consumers• Consumer tuning is an art • Overcommit - more threads than partitions • Idling (often entire consumer processes) • Excessive rebalancing • Under commit - less threads than partitions • Serial fetchers won’t keep up depending on workload • Big GCs can cause rebalancing • Just right - 2 partitions / consumer thread ratio • Mostly pivots on consumer workload (i.e. latency)
  • 176. Operational Considerations - Incubators GonnaIncubate
  • 177. Operational Considerations - Incubators GonnaIncubate • Deployed in some large installations
  • 178. Operational Considerations - Incubators GonnaIncubate • Deployed in some large installations• Largely learning in production
  • 179. Operational Considerations - Incubators GonnaIncubate • Deployed in some large installations• Largely learning in production• Hasn’t lived through a long lineage of people being mean to it or using in anger
  • 180. Operational Considerations - Incubators GonnaIncubate • Deployed in some large installations• Largely learning in production• Hasn’t lived through a long lineage of people being mean to it or using in anger2012-06-15 04:25:00,774 - ERROR [kafka-processor-3:Processor@215] - java.lang.RuntimeException:OOME with size 1195725856java.lang.RuntimeException: OOME with size 1195725856 at kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:81) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:60) at kafka.network.Processor.read(SocketServer.scala:283) at kafka.network.Processor.run(SocketServer.scala:202) at java.lang.Thread.run(Thread.java:662)Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39) at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) at kafka.network.BoundedByteBufferReceive.byteBufferAllocate(BoundedByteBufferReceive.scala:77)
  • 181. Operational Considerations - Incubators GonnaIncubate
  • 182. Operational Considerations - Incubators GonnaIncubate
  • 183. Operational Considerations - Incubators GonnaIncubate
  • 184. Operational Considerations - Incubators GonnaIncubate• With any incubator project, assume it will be rough around the edges
  • 185. Operational Considerations - Incubators GonnaIncubate• With any incubator project, assume it will be rough around the edges• Assume that if you point your monitoring agent at the service port, things will break
  • 186. Operational Considerations - Incubators GonnaIncubate• With any incubator project, assume it will be rough around the edges• Assume that if you point your monitoring agent at the service port, things will break• As a general practice, measure the intended outcome of production changes
  • 187. AcknowledgementsThe storage models proposed were inspired and adaptedby:http://engineering.twitter.com/2010/05/introducing-flockdb.htmlhttps://github.com/mochi/statebox
  • 188. Q&AWe’re hiring!• Infrastructure• Django• OperationsContact:erik@urbanairship.com (that I put my email in slides is notan invitation to sell me software so don’t do that)@eonnen - twitter