Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

JDD2015: Make your world event driven - Krzysztof Dębski

157 views

Published on

MAKE YOUR WORLD EVENT DRIVEN

Just after you set up your first microservice you realize that the game has just started. You need to improve latency in your application and reduce unnecessary communication.
To make your architecture fully decoupled you need to embrace asynchronous communication. Good way to achieve that is to switch to Event Driven Architecture.
We will see how to use Kafka in your microservices. We will also cover some pitfalls you might face during using Kafka and how to deal with them.
After the talk you will know the toolset that are need to improve your microservice ecosystem.

Published in: Business
  • Be the first to comment

  • Be the first to like this

JDD2015: Make your world event driven - Krzysztof Dębski

  1. 1. Make your world event driven Krzysztof Debski @DebskiChris
  2. 2. 15 years as an IT professional @DebskiChris http://hermes.allegro.tech Who am I
  3. 3. Allegro 500+ people in IT 50+ independent teams 16 years on market 2 years after technical revolution
  4. 4. Events
  5. 5. Events are everywhere Log data Database replication Warehouse dump Search engines Messaging systems
  6. 6. How to handle events? Stream data platform Data Integration Stream processing
  7. 7. Events and microservices Service Service Service Service Service Service Service Service Service Service Service
  8. 8. Events and microservices Service Service Service Service Service Service Service Service Service Service Service Domain Domain Domain
  9. 9. Kafka
  10. 10. Kafka as a backbone Service Producer Service Consumer Kafka Broker Zookeeper
  11. 11. Kafka Data 10 9 7 8 6 5 3 4 2 1 8 5 4 2 1 10 9 7 6 3 Data Topic
  12. 12. Topic Producer_1 … Producer_n Remove old events Publish event Topic
  13. 13. Partitioning 10 9 7 8 6 5 3 4 2 1 8 5 4 2 1 10 9 7 6 3 5 4 2 8 1 Data Topic Partition 5 4 2 8 1 Replicas
  14. 14. Partitioning Producer_1 … Producer_n Publish event Partition 0 Partition 1 Partition 2
  15. 15. Partitioning Service Producer Service Consumer Broker Zookeeper Broker Broker P1 P0 P2 P1 P0 P2
  16. 16. Topics operations Create auto.create.topics.enabled=true Change Replication factor Partition count – only increasing Delete >= 0.8.2 delete.topic.enable=true
  17. 17. Demo Initial list of brokers is static New Producer API from 0.8.2 Async producer by default Key partitioning is tricky ACK is set by producer
  18. 18. Subscriber Consumer 1 Broker P0 Broker P1 Broker P2 Broker P3 Consumer 2 Consumer 3 Consumer 4 Consumer 5 Consumer 6 Consumer group 1 Consumer group 2
  19. 19. Subscriber Consumer 1 Broker P0 Broker P1 Broker P2 Broker P3 Consumer 2 Consumer 3 Consumer 4 Consumer 5 Consumer 6 Consumer group 1 Consumer group 2
  20. 20. Subscriber Producer_1 … Producer_n Consumer_group_1Consumer_group_2 Remove old events Publish event Read eventRead event Remove old messages
  21. 21. Offset management <=0.8.1 - Zookeeper >=0.8.2 - Zookeeper or Kafka >=0.9(?) - Kafka
  22. 22. Demo Simple consumer vs. High level consumer Offset storage Dual commits Scaling consumers
  23. 23. KAFKA-1682 Kafka <= 0.8.2 No security Kafka > 0.8.2 unix-like users, permissions, ACL
  24. 24. Performance issues
  25. 25. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
  26. 26. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3 Brokers that should have partition copies
  27. 27. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3 In Sync Replicas
  28. 28. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3 Leader broker ID
  29. 29. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 3 Replicas: 3, 1 ISR: 3, 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
  30. 30. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2
  31. 31. Rebalancing leaders Broker 1 P1 P0 Broker 2 P2 P1 Broker 3 P0 P2 Topic: test Partition count: 3 Replication factor: 1 Configs: retention.ms=86400000 Topic: test Partition: 0 Leader: 1 Replicas: 3, 1 ISR: 1, 3 Topic: test Partition: 1 Leader: 1 Replicas: 1, 2 ISR: 1, 2 Topic: test Partition: 2 Leader: 2 Replicas: 2, 3 ISR: 2, 3
  32. 32. Lost events
  33. 33. ACK levels 0 - don’t wait for response from the leader 1 - only the leader has to respond -1 - all replicas must be in sync Speed Safety
  34. 34. Lost Events ERROR [Replica Manager on Broker 2]: Error when processing fetch request for partition [test,1] offset 10000 from consumer with correlation id 0. Possible cause: Request for offset 10000 but we only have log segments in the range 8000 to 9000. (kafka.server.ReplicaManager)
  35. 35. Lost events Broker 1 Broker 2 Producer ACK = 1 Replication factor = 1 replica.lag.max.messages = 2000 commited offset = 10000 commited offset = 9000 Zookeeper
  36. 36. Lost events Broker 1 Broker 2 Producer ACK = 1 Replication factor = 1 replica.lag.max.messages = 2000 commited offset = 10000 commited offset = 9000 Zookeeper
  37. 37. Lost events Broker 1 Broker 2 Producer ACK = 1 Replication factor = 1 replica.lag.max.messages = 2000 commited offset = 10000 commited offset = 9000 Zookeeper commited offset = 9000
  38. 38. Monitoring
  39. 39. Kafka Offset Monitor
  40. 40. Graphite
  41. 41. Slow responses
  42. 42. Slow responses 75% 99% 99,9% responsetime
  43. 43. Slow responses vs. message size messagesize 75% 99% 99,9%
  44. 44. Fixed message sizeresponsetime 75% 99% 99,9%
  45. 45. Kafka kernel 3.2.x
  46. 46. Kafka kernel 3.2.x
  47. 47. Kafka kernel 3.2.x kernel >= 3.8.x
  48. 48. Optimize throughput
  49. 49. Message sizemessagesize 99,9% all topics 99,9% biggest topic
  50. 50. Optimize message size JSON human readable big memory and network footprint poor support for Hadoop
  51. 51. Optimize message size JSON Snappy ERROR Error when sending message to topic t3 with key: 4 bytes, value: 100 bytes with error: The server experienced an unexpected error when processing the request (org.apache.kafka.clients.producer.internals. ErrorLoggingCallback) java: target/snappy-1.1.1/snappy.cc:423: char* snappy::internal:: CompressFragment(const char*, size_t, char*, snappy::uint16*, int): Assertion `0 == memcmp(base, candidate, matched)' failed. errors on publishing large amount of messages
  52. 52. Optimize message size JSON Snappy Lz4 failed on distributed data compressionratio single topic multiple topics
  53. 53. Optimize message size JSON Snappy Lz4 Avro small network footprint Hadoop friendly easy schema verification
  54. 54. Allegro QR contest
  55. 55. Hermes
  56. 56. Hermes
  57. 57. Hermes Hermes Frontend Hermes Frontend Hermes Frontend Hermes Consumer Hermes Consumer REST REST, JMS
  58. 58. Topic management pl.allegro.JDD2015.demo.basic Group Topic
  59. 59. Delivery model Exactly once At most once At least once
  60. 60. Delivery model Exactly once - almost impossible At most once - risky At least once
  61. 61. Event identification Hermes Frontend Kafka Broker POST {“event”: ”test”} { "id": "58d7ff07-dd0e-4103-9b1f-55706f3049e6", "timestamp”: 1430443071995, “data”: {“event”: ”test”} } HTTP 201 Created Message-id: 58d7ff07-dd0e-4103-9b1f-55706f3049e6
  62. 62. Lost events Hermes Frontend Producer Hermes Consumer Consumer Kafka Broker Zookeeper Tracker Publication data Delivery attempts
  63. 63. Multi data center Hermes Frontend Hermes Manager Hermes Frontend Hermes Consumer Hermes Consumer
  64. 64. Slow responses - normal Hermes Frontend Producer Hermes Consumer Consumer Kafka Broker Zookeeper POST HTTP 201 Created
  65. 65. Slow responses - fail Hermes Frontend Producer Hermes Consumer Consumer Kafka Broker Zookeeper POST HTTP 202 Accepted
  66. 66. Improved security Authentication and authorization interfaces provided By Default: You can create any topic in your group You can publish everywhere (in progress) Group owner defines subscriptions
  67. 67. Improved offset management Hermes Producer Hermes consumer Remove old messages Publish event Commited Local unsent events Read event
  68. 68. Improved offset management Hermes consumer Remove old messages Local unsent events New event Service instance
  69. 69. Improved offset management Hermes consumer Remove old messages Local unsent events New event Service instance HTTP 503 Unavailable
  70. 70. Improved offset management Remove old messages Local unsent events New event Service instance HTTP 503 Unavailable Check TTL & Add to queue Hermes consumer
  71. 71. Consumer backoff 100% adapt 1/s 1/min
  72. 72. Turn back the time PUT /groups/{group}/topics/{topic}/subscriptions/{subscription}/retransmission -8h
  73. 73. Find us: Blog: allegrotech.io Twitter: @allegrotechblog work with us kariera.allegro.pl

×