Successfully reported this slideshow.
Your SlideShare is downloading. ×

Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and gRpc

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 59 Ad

Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and gRpc

Download to read offline

At Wix, we have created a universal event-driven programming infrastructure on top of the Kafka message broker.
This infra makes sure messages are eventually successfully consumed and produced no matter what failure it encounters.

In this talk, you will learn about the features we introduced in order to make sure our distributed system can safely handle an ever growing message throughput in a fault tolerant manner.
You will be introduced to such techniques as retry topics, local persistent queues, and cooperative fibers that help make your flows more resilient and performant.

You will also learn how to make this infra work for all programming languages tech stacks with optimal resource manage using the power of Kubernetes and gRPC.
When to use a client library, and when to deploy an external pod (DaemonSet, StatefulSet) or even deploy a sidecar.

At Wix, we have created a universal event-driven programming infrastructure on top of the Kafka message broker.
This infra makes sure messages are eventually successfully consumed and produced no matter what failure it encounters.

In this talk, you will learn about the features we introduced in order to make sure our distributed system can safely handle an ever growing message throughput in a fault tolerant manner.
You will be introduced to such techniques as retry topics, local persistent queues, and cooperative fibers that help make your flows more resilient and performant.

You will also learn how to make this infra work for all programming languages tech stacks with optimal resource manage using the power of Kubernetes and gRPC.
When to use a client library, and when to deploy an external pod (DaemonSet, StatefulSet) or even deploy a sidecar.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and gRpc (20)

Advertisement

More from Natan Silnitsky (17)

Recently uploaded (20)

Advertisement

Polyglot, Fault Tolerant Event-Driven Programming with Kafka, Kubernetes and gRpc

  1. 1. natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil Polyglot, Fault-Tolerant, and Performant Event-Driven Programming with Kafka, Kubernetes and gRPC Natan Silnitsky Backend Infra Developer, Wix.com
  2. 2. registered users from 190 countries 180M of all internet websites run on Wix 5% About Wix @NSilnitsky
  3. 3. Wix Editor Service Metasite Service Restaurant app Service 1,500 Microservices Publish Site @NSilnitsky
  4. 4. Scala Python NodeJS 1,500 Microservices @NSilnitsky
  5. 5. Wix 1510M Kafka messages a day @NSilnitsky
  6. 6. Wix 1,075M Kafka messages a day So, we need our message flows to be performant, fault-tolerant, and polyglot.
  7. 7. Agenda Event-driven programming with Kafka + Performance with Greyhound + Fault-tolerance with Greyhound & Kubernetes + Polyglot with Kubernetes & gRPC
  8. 8. @NSilnitsky HTTP * coupled Request-Reply Communication New App installed Site Apps Service ECom Catalog Service Classic.
  9. 9. @NSilnitsky HTTP New App installed What if Network is Unreliable Request-Reply Communication Site Apps Service ECom Catalog Service
  10. 10. @NSilnitsky * 1500 hard HTTP New App installed Request-Reply Communication Cascading Failures can happen.
  11. 11. @NSilnitsky Message Consumer Message Producer Broker Event-driven Communication Introduce a Broker. * clusters and replications
  12. 12. @NSilnitsky Broker Event-driven Communication Message Consumer Message Producer Introduce a Broker.
  13. 13. @NSilnitsky Kafka Broker Event-driven Communication Site Apps Topic 0 1 2 3 45 0 1 2 3 45 0 1 2 3 45 0 1 2 3 45 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Where you store messages (events!) A Kafka Broker Looks Like This: * vs message queues
  14. 14. @NSilnitsky Kafka Broker Event-driven Communication Site Apps Topic 0 1 2 3 45 0 1 2 3 45 0 1 2 3 45 0 1 2 3 45 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Kafka Producer Site Apps Service There’s the Event Producer,
  15. 15. @NSilnitsky Kafka Broker Event-driven Communication Site Apps Topic 0 1 2 3 45 0 1 2 3 45 0 1 2 3 45 0 1 2 3 45 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Kafka Consumer Ecom Catalog Service and Event Consumer. * scale
  16. 16. Site Apps Service Kafka Consumer Kafka Producer Greyhound wraps Kafka Ecom Catalog Service Kafka Broker Greyhound Producer Greyhound Consumer
  17. 17. @NSilnitsky Simplify APIs, with additional features Greyhound wraps Kafka Site Apps Service Kafka Consumer Kafka Producer Ecom Catalog Service Kafka Broker
  18. 18. @NSilnitsky Abstract so that it is easy to change for everyone Kafka Consumer Kafka Producer Kafka Broker Greyhound wraps Kafka
  19. 19. Performant Event-driven Programming with Kafka and Greyhound (Wix OSS) Wix 1,075M Kafka messages a day
  20. 20. @NSilnitsky Kafka Broker Topic Greyhound Consumer Kafka Consumer 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 Performant message handling A Service Message Handler
  21. 21. @NSilnitsky Kafka Broker Topic Greyhound Consumer Kafka Consumer SCALA ZIO FIBERS + QUEUES 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 (Thread-safe) Parallel Message Consumption Performant message handling A Service Message Handler
  22. 22. @NSilnitsky 80 partitions Kafka Broker Topic 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 0 1 2 3 4 5 Performant message handling Maximum Throughput: Messages Processing - Non blocking IO (Netty grpc client) with response latency of 100ms
  23. 23. @NSilnitsky 80 partitions Kafka Broker Topic 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 0 1 2 3 4 5 Performant message handling Maximum Throughput: 800 Messages per second Messages Processing - Non blocking IO (Netty grpc client) with response latency of 100ms
  24. 24. @NSilnitsky Kafka ConsumerKafka ConsumerKafka ConsumerKafka ConsumerKafka ConsumerKafka ConsumerKafka Consumer Performant message handling 80 Kafka Consumers with 80 Java threads 80 partitions Kafka Broker Topic 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 0 1 2 3 4 5 Messages Processing - Non blocking IO (Netty grpc client) with latency of 100ms Kafka Consumer
  25. 25. @NSilnitsky Greyhound Performant message handling or 1 Greyhound Consumer with 80 fibers running on a small thread pool Kafka Consumer 80 partitions Kafka Broker Topic 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 0 1 2 3 4 5 Messages Processing - Non blocking IO (Netty grpc client) with latency of 100ms
  26. 26. fault-tolerant Event-driven Programming with Kafka, Greyhound & Kubernetes Wix 1,075M Kafka messages a day
  27. 27. @NSilnitsky Kafka Broker renew-sub-topic 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 Greyhound Consumer Kafka Consumer Fails To Read Fault-tolerant message handling
  28. 28. @NSilnitsky renew-sub-topic 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 renew-sub-topic-retry-0 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 renew-sub-topic-retry-1 0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5 Greyhound Consumer Kafka Consumer RETRY PRODUCER Fault-tolerant message handling Inspired by Uber RETRY! Kafka Broker
  29. 29. @NSilnitsky Producer Wix Payments Service Subscription renewal Job Scheduler Fault-tolerant message handling Use Case: Guarantee Completion Kafka Broker
  30. 30. @NSilnitsky + Retry on failure Kafka Broker Producer Consumer Wix Payments Service Subscription renewal Job Scheduler Fault-tolerant message handling Use Case: Guarantee Completion
  31. 31. @NSilnitsky Producer Consumer Subscription renewal Job Scheduler Fault-tolerant message handling Wix Payments Service Use Case: Guarantee Completion Kafka Broker + Retry on failure
  32. 32. @NSilnitsky Producer Fault-tolerant message handling The Resilient Producer Kafka Broker
  33. 33. @NSilnitsky Save message to disk Kafka Broker Producer Fault-tolerant message handling The Resilient Producer: When failed to produce will save the message
  34. 34. @NSilnitsky Kafka Broker Producer Fault-tolerant message handling The Resilient Producer: When failed to produce will save the message and retry on failure. + Retry on failure
  35. 35. @NSilnitsky Kebe Fault-tolerant message handling What if pod is killed? pod Kube Node
  36. 36. @NSilnitsky Kebe Fault-tolerant message handling What if pod is killed? pod Kube Node
  37. 37. @NSilnitsky Kebe Fault-tolerant message handling What if pod is killed? pod Kube Node
  38. 38. Fault-tolerant message handling 38 @NSilnitsky Kubernetes Node 1 DaemonSet pod pod 1 pod 2 Kubernetes Node 2 DaemonSet pod pod 1 pod 2 DaemonSet
  39. 39. pod Fault-tolerant message handling Scavenger DaemonSet Kube Node @NSilnitsky
  40. 40. @NSilnitsky Kafka Broker pod Scavenger Kube Node DaemonSet What if pod is killed? Fault-tolerant message handling Flush out messages * small footprint
  41. 41. Polyglot Event-driven Programming with Kubernetes & gRPC Wix 1,075M Kafka messages a day
  42. 42. @NSilnitsky Polyglot message handling Motivation: code reuse Kafka Broker Greyhound Scala/Java services Greynode NodeJS services Producer Consumer Producer Consumer
  43. 43. @NSilnitsky Polyglot message handling Partial implementation is a problem Kafka Broker Greyhound Greynode NodeJS services Producer Consumer Producer Consumer
  44. 44. @NSilnitsky Polyglot message handling Experiment #1 Greyhound on GraalVM Kafka Broker Greyhound NodeJS services GraalVM Producer Consumer
  45. 45. @NSilnitsky Greyhound Polyglot message handling Experiment #2 Greyhound Sidecar Kafka Broker NodeJS services Consumer gRPC Producer
  46. 46. @NSilnitsky Kafka Broker Greyhound Polyglot message handling Experiment #2 Greyhound Sidecar gRPC .proto Sidecar written with Scala Service written with JS & TS
  47. 47. @NSilnitsky Kafka Broker Greyhound Polyglot message handling Experiment #2 Greyhound Sidecar gRPC Sidecar written with Scala .proto Service written with Python gRPC Service written with JS & TS
  48. 48. 48 @NSilnitsky Polyglot message handling pod Kube Node Containerized app Volume The Sidecar resides with the app in the same pod.
  49. 49. @NSilnitsky Kubernetes pod with main and sidecar containers Polyglot message handling pod Kube Node pod pod Kafka Broker @NSilnitsky
  50. 50. @NSilnitsky There’s a memory issue. pod Kube Node pod pod Polyglot message handling Kafka Broker @NSilnitsky
  51. 51. @NSilnitsky Kafka Broker DaemonSet Optimization: Greyhound in Daemonset Polyglot message handling @NSilnitsky
  52. 52. 52 Sidecar Daemonset ✔ Simple design (simple state) ✘ Complex Design (Multi-tenant state) ✔ App and sidecar lifecycles are in sync ✘ Daemonset GA means downtime ✔ Failure footprint is small ✘ Failure affects more consumers ✘ Memory overhead/footprint ✔ Memory usage per Kube Node @NSilnitsky Design Dilemmas
  53. 53. Standalone Producer Service Mitigates Daemonset Greyhound downtime @NSilnitsky gRPC gRPC * network hop Kafka Broker Greyhound Producer Greyhound Producer
  54. 54. So, We use Kubernetes’ flexibility to deploy Greyhound producers and consumers in different patterns, in order to comply with different requirements. @NSilnitsky
  55. 55. Wix harnesses Kafka, Kubernetes and gRPC to achieve a polyglot, fault tolerant, scalable event-driven distributed system.
  56. 56. @NSilnitsky A Java/Scala high-level SDK for Apache Kafka. 0.1 is out! github.com/wix/greyhound
  57. 57. @NSilnitsky Thank You natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil
  58. 58. @NSilnitsky Slides & More slideshare.net/NatanSilnitsky medium.com/@natansil twitter.com/NSilnitsky natansil.com
  59. 59. @NSilnitsky Q&A natansil.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil

×