Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real time stream processing with Kafka, Python and Faust

South Gate Tech’s was invited to hold a webinar in front of a London-based engineering community, called Chеltenham Geek Nights and present a case study about the implementation of Kafka, Faust and Python , hosted in AWS. The case study describes the solution that South Gate Tech’s engineering team developed and deployed for Tide as part of a customer project, delivered by a dedicated team. Presentation by Georgi Tenev - Senior Engineer at South Gate Tech.

South Gate Tech is a software development, outsourcing and staff augmentation service provider, specialising in big data, cloud, digital transformation and software engineering. Go to https://southgate.tech for more case studies and to get in touch about your project needs.

  • Be the first to comment

  • Be the first to like this

Real time stream processing with Kafka, Python and Faust

  1. 1. Real Time Stream Processing with Kafka & Python Georgi Tenev May, 2020
  2. 2. About me ● Georgi ● UK + BE + BG; Dev + DevOps ● Nike Running Blue Level - 2000 km milestone by end of 2020 :) Into cars, motorbikes & mountain bikes. ● Member of the family of a boutique software company based in Sofia - South Gate Tech.
  3. 3. ● Tide use-case ● Kafka ● Faust Agenda for today
  4. 4. Tide ● Provides bank accounts for businesses, fully online ● Approaching 200,000 members ● Data Driven
  5. 5. What’s the pain ● Don’t onboard fraudulent members ● Automate the decision process, where possible ○ faster and less error-prone compared to manual approval ● Decide quickly ○ auto-approve ○ send for manual approval
  6. 6. Kafka
  7. 7. Kafka ● 10,000ft. overview - message broker which durably persists messages, designed for massive scale ○ consists of a cluster of brokers; A topic is the main entity ● Why so popular ○ scale ○ HA, durability & replication ○ Disk IO optimization ○ lightweight consumers ○ turn your db “inside out”
  8. 8. Kafka Topic ● Producers write to topic ● Consumers within a consumer group read from topic ● Retention, Replication, Durability image credits
  9. 9. Partitions ● Topic consists of >= 1 partitions ● Append only write-ahead log ○ OS optimization ● Offset ● Ordering image credits
  10. 10. Consumer group ● Pub-sub semantics for the whole Consumer group ● Queue semantics for consumers within a consumer group
  11. 11. Consumer group ● Pub-sub semantics for the whole Consumer group ● Queue semantics for consumers within a consumer group ● consumer gets a subset of partition
  12. 12. What is Faust ● Asynchronous Stream Processing Python Framework ● Developed by Robinhood ○ “a pioneer of commission-free investing” ○ “build scalable and reliable distributed systems much faster”
  13. 13. Anatomy of a Faust app
  14. 14. Anatomy of a Faust app
  15. 15. Anatomy of a Faust app
  16. 16. Faust Agent ● Main processing actor in a Faust App ● A unary async function - receives a stream as its argument
  17. 17. Faust Stream ● ~ async python generator ● Abstraction over a kafka topic ● Can apply operations on the stream (e.g. orders.filter(), orders.take(5))
  18. 18. Faust Record ● DTO, Represents events with fully fledged python class instances ● Serialization & Deserialization
  19. 19. Apache AvroSchema ● “data serialization system” ● cf JSON Schema ● why use it?
  20. 20. Faust + Avro avro schema from previous slide
  21. 21. Real-time stream processing - Tide use case ● Tide - Data Science department ● Detect fraud risk of new members
  22. 22. Overview of the solution
  23. 23. Decisioning Pipeline Stages
  24. 24. Architecture ● “Event Collaboration” pattern ○ choreographed components ○ cf event orchestration ● Benefits? ● Why use Python? ● Why use Faust?
  25. 25. Challenges ● Shadowing of different engine versions (~ canary deployment) ● Joining streams ● Using a single kafka topic
  26. 26. Interesting resources ● Book - Designing Event-Driven Systems by Confluent ● Article - Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines) ● Faust's documentation ● Skeleton Faust project ● AvroSchema documentation
  27. 27. Questions

×