Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Tour of Apache Kafka

401 views

Published on

Speaker: Matt Howlett, Software Engineer, Confluent

This presentation provides a technical overview of Apache Kafka® and covers some of its popular use cases.

Published in: Technology
  • Be the first to comment

A Tour of Apache Kafka

  1. 1. 1Confidential A Tour of Apache Kafka Matt Howlett Engineer, Confluent Inc.
  2. 2. 2Confidential Agenda 1. Technical Overview of Apache Kafka 2. Use Cases
  3. 3. 3Confidential What is Apache Kafka? Kafka is a streaming platform. A distinct tool in your toolbox, like a relational database or a traditional messaging system. A streaming platform encourages architectures that have an emphasis on events and changes to data (not data at rest). Widely applicable. E.g. consider Walmart.
  4. 4. 4Confidential Who Uses Kafka Today? ● 35% of Fortune 500 + thousands of companies world wide use Kafka ● Across all industries ● High growth of usage within companies
  5. 5. 5Confidential Core Kafka Pt. 1 Traditional messaging: move data Kafka: make data available
  6. 6. 6Confidential ● Simple: ○ High performance ○ Robust horizontal scalability ● Suitable for real-time, streaming and batch operations ● Ad-hoc consumption & reprocessing ● Immutable: ○ Easier to debug/reason about v.s. ephemeral data ○ Auditable by default Core Kafka Pt. 2: Why Logs?
  7. 7. 7Confidential Core Kafka Pt. 3 - Scaling Notes: ordering per partition only re-partitioning Key Value Message Kafka topics are partitioned logs
  8. 8. 8Confidential Core Kafka Pt. 4 - Durability Kafka topics are replicated partitioned logs Notes: all reads and writes are to leader replica
  9. 9. 9Confidential Core Kafka Pt. 5: How Scalable is Kafka? ● No bottleneck! ○ Many brokers ○ Many producers ○ Many consumers ● Limits? ○ Internet giants are driving the limits higher - you won’t need to worry. ○ e.g. LinkedIn > 1 trillion messages / day through Kafka clusters. ○ 100 brokers / 2 billion messages a day is “straightforward” to operate ○ Don’t over partition ~< 100k partitions producers brokers consumers
  10. 10. 10Confidential Components of Apache Kafka
  11. 11. 11Confidential Kafka Clients Use cases: ● Integration with custom applications ○ log application events ○ Invoke REST API ● Stateless stream processing (filter, transform) Confluent supported clients: ● Java ● C (librdkafka) ○ C++ ○ Python ○ C# / .NET ○ Go ● REST Proxy
  12. 12. 12Confidential ● Off-the-shelf connectors ○ Confluent Hub ● Standardized framework ○ Scalable ○ Fault Tolerant Kafka Connect ● Stateless workers ○ Un-opinionated deployment ● REST API ● Transforms ● Exactly Once
  13. 13. 13Confidential Kafka Streams ● Just a library! A library that makes it easy to do stateful operations (joins, aggregations, windowing). ● Elastically scalable ○ distributed! ● Fault tolerant ● Un-opinionated deployment ● State backed by Kafka used as a changelog ● Exactly once processing ● Record-at-a-time processing ● Complex topologies ○ (but keep it simple) ● JVM only (Java, Scala, etc.)
  14. 14. 14Confidential Confluent: A More Complete Streaming Platform +
  15. 15. 15Confidential Use Cases
  16. 16. 16Confidential When Should You Use Kafka? Scalability ● Quantity of Data ○ Simple Applications (or not) ● Complexity ○ Architectural ○ Organizational
  17. 17. 17Confidential Buffering Pt. 1 Kafka is a very good buffer: ● Write optimized ● Highly reliable ● Tolerate data spikes ● Tolerate downstream outages ● Used by KStreams (no back pressure problems) Other examples:
  18. 18. 18Confidential Buffering Pt. 2 Move data to multiple locations
  19. 19. 19Confidential Data Integration Explosion of Data Sources and Processing Frameworks
  20. 20. 20Confidential Point-to-Point vs …
  21. 21. 21Confidential … Hub-and-Spoke ● # connections: O(N) vs potentially O(N^2) ● standardized, reliable data transport ● standardized data format
  22. 22. 22Confidential Dual Writes Problem Without a log, there’s potential consistency problem:
  23. 23. 23Confidential Data Integration - Eventual Consistency
  24. 24. 24Confidential Change Data Capture
  25. 25. 25Confidential You can think of Kafka as a commit log for your entire organization “turning the database inside out”
  26. 26. 26Confidential Advanced ETL #1: PII Data Filter ● Kafka security: TLS encryption, flexible authentication + authorization (secured) kafka streams
  27. 27. 27Confidential Advanced ETL #2: Enriching stream data
  28. 28. 28Confidential Advanced ETL #2: Stream / Table Join in KSQL CREATE STREAM enriched_weblog AS SELECT ip, text, g.location AS location FROM weblog w LEFT JOIN geo g ON w.ip = g.ip; ● Query is long running on a KSQL cluster. ● Create weblog stream and geo table (backed by kafka topics) first. ● Currently, KSQL can interpret AVRO, JSON and CSV.
  29. 29. 29Confidential Stream Processing App #1: Anomaly Detection / Alerting CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 10 SECONDS) GROUP BY card_number HAVING count(*) > 3; ● Use Kafka Streams and/or additional input streams in more sophisticated algorithm ● possible_fraud is a change log stream where key is [card_number, window_start] authorization_attempts possible_fraud SMS Gateway
  30. 30. 30Confidential Microservices What are Microservices? ● Independently deployable, small units of functionality ○ (not a formal definition) ○ Primary motivation: decouple teams (scale in people terms) ○ Usually REST endpoints + commands/queries Microservices can also be built on a backbone of events: ○ PII Filter ○ Weblog enricher ○ SMS fraud alert notifier ○ ... just the start
  31. 31. 31Confidential Microservices - Commands Pt. 1
  32. 32. 32Confidential Microservices - Commands Pt. 2 Adding new functionality that depends on placing orders requires changes to the Orders Service.
  33. 33. 33Confidential Microservices - Receiver Driven Flow Control ● Pricing Service team does not need to talk to Orders Service team ● Trade off: no statement of overall behavior
  34. 34. 34Confidential Microservices - Queries Note: Eventual Consistency
  35. 35. 35Confidential Microservices ● Decreased coupling - Orders Service materializes the view it requires (Kafka can even act as system of record) ● Often appropriate at larger scales
  36. 36. 36Confidential Microservices - Data Dichotomy The Data Dichotomy “Data systems are about exposing data. Services are about hiding it”
  37. 37. 37Confidential Thank You! @matt_howlett https://www.confluent.io/blog We’re hiring!

×