Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Kafka

797 views

Published on

This is a presentation introducing Kafka and some of its core concepts as presented at the Sydney ALT.NET meet-up on 26 April 2016.

Published in: Technology
  • great presentation Ducas. for someone new to hands on coding this was really well presented and followed by solid live examples. feel now i can have deeper discussions now around this messaging system. :)
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to Kafka

  1. 1. Introduction to Kafka BY DUCAS FRANCIS
  2. 2. The problem Web Security System Real-time Monitoring Logging System Other services Mobile API Job It’s simple enough at first… Then it gets a little busy… And ends up a mess.
  3. 3. The solution Web Security System Real-time Monitoring Logging System Other services Mobile API Job Pub/Sub Decouple data pipelines using a pub/sub system Producers Brokers Consumers
  4. 4. Apache Kafka A UNIFIED, HIGH- THROUGHPUT, LOW-LATENCY PLATFORM FOR HANDLING REAL-TIME DATA FEEDS
  5. 5. A brief history lesson  Originally developed at LinkedIn in 2011  Graduated Apache Incubator in 2012  Engineers from LinkedIn formed Confluent in 2014  Up to version 0.9.4 with 0.10 on horizon
  6. 6. Motivation  Unified platform for all real-time data feeds  High throughput for high volume streams  Support periodic data loads from offline systems  Low latency for traditional messaging  Support partitioned, distributed, real-time processing  Guarantee fault-tolerance
  7. 7. Common use cases  Messaging  Website activity tracking  Metrics  Log aggregation  Stream processing  Event sourcing  Commit log
  8. 8. Benefits of Kafka  High throughput  Low latency  Load balancing  Fault tolerant  Guaranteed delivery  Secure
  9. 9. Performance comparison
  10. 10. Batch performance comparison
  11. 11. Some terminology  Topic – feed of messages  Producer – publishes messages to a topic  Consumer – subscribes to topics and processes the feed of messages  Broker – server instance that acts in a cluster
  12. 12. @apachekafkapowers @microsot…
  13. 13. Libraries  Python – kafka-python / pykafka  Go – sarama / go_kafka_client / …  C/C++ - librdkafka / libkafka / …  .NET – kafka-net (x2) / rdkafka-dotnet / CSharpClient-for-Kafka  Node.js – kafka-node / sutoiku/node-kafka / ...  HTTP – kafka-pixy / kafka-rest  etc.
  14. 14. Architecture Producer Producer Broker BrokerBroker Consumer ConsumerZookeeper Cluster x3
  15. 15. Show me the Kafka!!! VAGRANT TO THE RESCUE
  16. 16. Anatomy of a topic  Topics are broken into partitions  Messages are assigned sequential ID called and offset  Data is retained for a configurable period of time  Number of partitions can be increased after creation, but not decreased  Partitions are assigned to brokers Each partition is an ordered, immutable sequence of messages that is continually appended to… a commit log.
  17. 17. Broker  Kafka service running as part of a cluster  Receives messages from producers and serves them to consumers  Coordinated using Zookeeper  Need odd number for quorum  Store messages on the file system  Replicate messages to/from other brokers  Answer metadata requests about brokers and topics/partitions  As of 0.9.0 – coordinate consumers
  18. 18. Replication  Partitions on a topic should be replicated  Each partition has 1 leader and 0 or more followers  An In-Sync Replica (ISR) is one that’s communicating with Zookeeper and not too far behind the leader  Replication factor can be increased after creation, not decreased
  19. 19. ./kafka-topics --CREATE --REPLICATION-FACTOR --PARTITIONS --DESCRIBE
  20. 20. Producers  Publishes messages to a topic  Distributes messages across partitions  Round-robin  Key hashing  Send synchronously or asynchronously to the broker that is the leader for the partition  ACKS = 0 (none),1 (leader), -1 (all ISRs)  Synchronous is obviously slower, but more durable
  21. 21. Testing... Testing… 1 2 3 LET’S SEE HOW FAST WE CAN PUSH
  22. 22. Consumers  Read messages from a topic  Multiple consumers can read from the same topic  Manage their offsets  Messages stay on Kafka after they are consumed
  23. 23. Testing... Testing… 1 2 3 LET’S SEE HOW FAST WE CAN RECEIVE
  24. 24. It’s fast! But why…?  Efficient protocol based on message set  Batching messages to reduce network latency and small I/O operations  Append/chunk messages to increase consumer throughput  Optimised OS operations  pagecache  sendfile()  Broker services consumers from cache where possible  End-to-end batch compression
  25. 25. Load balanced consumers  Distribute load across instances in a group by allocating partitions  Handle failure by rebalancing partitions to other instances  Commit their offsets to Kafka Cluster Broker 1 Broker 2 P0 P1 P2 P3 Consumer Group 1 C0 C1 Consumer Group 2 C2 C3 C4 C6
  26. 26. Consumer groups and offsets Cluster Broker 1 Broker 2 P0 P1 P2 P3 Consumer Group 1 C0 C1 0 1 2 3 4 5 6 7 8 9 10P3 C1 read C1 commit C0 read C0 commit
  27. 27. Guarantees  Messages sent by a producer to a particular topic’s partition will be appended in the order they are sent  A consumer instance sees messages in the order they are stored in the log  For a topic with replication factor N, we will tolerate up to N-1 server failures without losing any messages committed to the log
  28. 28. Ordered delivery  Messages are guaranteed to be delivered in order by partition, NOT topic M1 M3 M5 M2 M4 M6 P0 P1  M1 before M3 before M5 – YES  M1 before M2 – NO  M2 before M4 before M6 – YES  M2 before M3 - NO
  29. 29. Enough ALT… now .NET USING RDKAFKA-DOTNET
  30. 30. FIN. THANK YOU
  31. 31. Resources  http://kafka.apache.org/documentation.html  http://www.confluent.io/  https://kafka.apache.org/090/configuration.html  https://github.com/edenhill/librdkafka  https://github.com/ah-/rdkafka-dotnet
  32. 32. Log compaction  Keep the most recent payload for a key  Use cases  Database change subscription  Event sourcing  Journaling for HA
  33. 33. Log compaction

×