Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Can Apache Kafka Replace a Database?

Can and should Apache Kafka replace a database? How long can and should I store data in Kafka? How can I query and process data in Kafka? These are common questions that come up more and more. This session explains the idea behind databases and different features like storage, queries, transactions, and processing to evaluate when Kafka is a good fit and when it is not.

The discussion includes different Kafka-native add-ons like Tiered Storage for long-term, cost-efficient storage and ksqlDB as event streaming database. The relation and trade-offs between Kafka and other databases are explored to complement each other instead of thinking about a replacement. This includes different options for pull and push-based bi-directional integration.

Key takeaways:
- Kafka can store data forever in a durable and high available manner
- Kafka has different options to query historical data
- Kafka-native add-ons like ksqlDB or Tiered Storage make Kafka more powerful than ever before to store and process data
- Kafka does not provide transactions, but exactly-once semantics
- Kafka is not a replacement for existing databases like MySQL, MongoDB or Elasticsearch
- Kafka and other databases complement each other; the right solution has to be selected for a problem
- Different options are available for bi-directional pull and push-based integration between Kafka and databases to complement each other

Video Recording:
https://youtu.be/7KEkWbwefqQ

Blog post:
https://www.kai-waehner.de/blog/2020/03/12/can-apache-kafka-replace-database-acid-storage-transactions-sql-nosql-data-lake/

  • Be the first to comment

Can Apache Kafka Replace a Database?

  1. 1. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Can Apache Kafka Replace a Database? Kafka’s Capabilities and Trade-Offs for Storage, Queries, Processing, Transactions, Connectivity Kai Waehner Field CTO contact@kai-waehner.de @KaiWaehner www.confluent.io www.kai-waehner.de linkedin.com/in/kaiwaehner
  2. 2. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? This was answered [with ‘yes’] a long time ago… 2 https://www.confluent.io/kafka-summit-SF18/is-kafka-a-database/ … and many things changed [= improved] since then!
  3. 3. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Yes. Kafka is a database! Can replace a Database?
  4. 4. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 4 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  5. 5. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 5 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  6. 6. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? What is a Database? 6 Database Concepts 1960s: Navigational DBMS 1970s, Relational DBMS Late 1970s: SQL DBMS 1980s: On the desktop 1990s: Object-oriented 2000s: NoSQL / NewSQL 2010s: DBaaS Database Features Storage Queries (CRUD) Processing Transactions Backup Replication …
  7. 7. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Database Theorems 7 Atomicity Consistency Isolation Durability Consistency Availability Partitioning
  8. 8. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Database Examples 8
  9. 9. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Database Examples 9 I thought Kafka is for data in motion?
  10. 10. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 10 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  11. 11. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Apache Kafka is a Platform for Data in Motion MES ERP Sensors Mobile Customer 360 Real-time Alerting System Data warehouse Producers Consumers Streams and storage of real time events Stream processing apps Connectors Connectors Stream processing apps Supplier Alert Forecast Inventory Customer Order 11
  12. 12. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? The Rise of Data in Motion 2010 Apache Kafka created at LinkedIn by Confluent founders 2014 2020 80% Fortune 100 Companies trust and use Apache Kafka
  13. 13. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? ETL/Data Integration Messaging Highly Scalable Durable Persistent Ordered Real-time Difficult to Scale No Persistence After Consumption No Replay Batch Expensive Time Consuming
  14. 14. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Highly Scalable Persistent ETL/Data Integration Messaging ETL/Data Integration Messaging Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Real-time Highly Scalable Durable Persistent Ordered Real-time Event Streaming
  15. 15. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Improve Customer Experience (CX) Increase Revenue (make money) Business Value Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in- car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $↔ Example Case Studies (of many)
  16. 16. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 16 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  17. 17. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Kafka’s Distributed Commit Log is the Storage (and enables real decoupling and domain-driven design) 17 https://www.confluent.io/blog/microservices-apache-kafka-domain-driven-design/
  18. 18. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Kafka Stores Your Data Durably. https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Kafka is the source of truth. Powers NYTimes.com, and stores all articles ever published since 1851. September 30, 1851, Page 1 Kafka is the leading system. Account Activity Replay API to recover events that weren’t delivered for various reasons https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/kafka-as-a-storage-system.html
  19. 19. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Confluent Tiered Storage for Kafka 19 (Only available in Confluent Platform) Store data forever Hot and cold storage Cheap object store Easy scale up/down No changes in clients
  20. 20. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Tiered Storage for Apache Kafka KIP-405 – Add Tiered Storage Support to Kafka Confluent is actively working on this with the open source community - Uber is leading this initiative Confluent Tiered Storage is available today in Confluent Platform and used under the hood in Confluent Cloud https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
  21. 21. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Log Compaction with Compacted Topics 21 Retain last known value for each message key No retention time
  22. 22. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Stateful Client Applications Kafka Streams and ksqlDB embed RocksDB 22 Do I really need another database for my microservice? streams
  23. 23. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Kafka as Single Source of Truth 23 The Leading System is Real-Time and Scalable Real Decoupling Handling Slow Consumers
  24. 24. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 24 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  25. 25. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Kafka Java Client Kafka Cluster Monolith Kafka Decouples Storage and Compute Cloud DWH Snowflake Connector Schema Registry Storage Compute Compute (+ non-Kafka Storage) KSQL App KSQL App KSQL App Compute Compute Compute
  26. 26. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Query and Event Processing in Kafka 26 PUSH à Continuously process and forward events PULL à Client requests events (like you know it from your favourite database)
  27. 27. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? ksqlDB - The Event Streaming Database 27 -- Continuously look up data in a table; query keeps running SELECT * FROM myTable WHERE ... EMIT CHANGES -- Continuously look up data in a stream; query keeps running SELECT * FROM myStream WHERE ... EMIT CHANGES -- Look up data in a table once; query then terminates SELECT * FROM myTable WHERE ... app app
  28. 28. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? ksqlDB - The Event Streaming Database • Project created by Confluent, source-available license: https://ksqldb.io/ • A ksqlDB cluster runs in a distributed manner across many server nodes • Tightly integrates with Apache Kafka® as its persistent storage layer • Has projections, transformations, aggregations, windowing, joins, etc. • Distinguishes between event-time and processing-time • Handles out-of-order and late data • Streaming import-export for external data systems • DDL and DML via SQL-like statements • Security features like role-based access control • Run it yourself or use SaaS offering in Confluent Cloud 28
  29. 29. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Continuous Queries and Processing 29
  30. 30. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Queries through the Kafka Consumer 30 • Continuous consumption of the latest events (in real time or batch) • Just specific time frames or partitions • All data from the beginning connect Cluster Linking REST Proxy
  31. 31. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Queries for Reprocessing Historical Events Give me all events from time A to time B Real-time Producer Time • New consumer application • Error-handling • Compliance / regulatory processing • Query and analyze existing events • Schema changes in analytics platform • Model training Real-time Consumer Consumer of Historical Data
  32. 32. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Interactive Queries Query values from the client applications’ state store Optional Proxy (e.g. HTTP or WebSockets) Limitation: Only Key/Value, no complex queries 32 streams
  33. 33. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? ANSI SQL Queries against the Kafka Log 3rd Party Add-Ons help Integration with any Business Intelligence Tool 33 https://www.confluent.io/blog/analytics-with-apache-kafka-and-rockset/
  34. 34. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 34 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  35. 35. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Exactly-Once Semantics (EOS) in Kafka No Two-Phase-Commit (because that does not scale) Idempotent Producer and Transactions API Supported by the whole Kafka Ecosystem (not just Messaging) 35
  36. 36. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Transaction API in Apache Kafka https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging https://www.confluent.io/kafka-summit-london18/dont-repeat-yourself-introducing-exactly-once-semantics-in-apache-kafka/
  37. 37. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? From the Mainframe to ksqlDB in the Cloud Bi-Directional End-to-End Referential Integrity ksqlDB App CICS Mainframe Transactions Bi-Directional Integration Secured Referential Integrity End-to-End “Transactions” Low Latency Database change Microservices events SaaS data Customer experiences Streams of real time events Kafka Exactly-Once Semantics using librdkafka IMS DB Cobol App Kafka Exactly-Once Semantics using ksqlDB
  38. 38. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Agenda 38 1. What is a Database? 2. What is Apache Kafka? 3. Storage in Kafka 4. Queries and Processing in Kafka 5. Transactions in Kafka 6. Connectivity
  39. 39. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Kafka Connect Integration between Databases, Applications, APIs, SaaS Kafka-native (no other middleware required) Sources and Sinks Legacy and Modern Real-Time and Batch 39
  40. 40. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Turn the Database Inside Out! Materialized Views Integration with any Database 40 https://www.confluent.io/blog/turning-the-database-inside-out-with-apache-samza/
  41. 41. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Global Event Streaming Streaming Replication between Kafka Clusters Bridge to Databases, Data Lakes, Apps, APIs, SaaS Aggregate Small Footprint Edge Deployments with Replication (Aggregation) Simplify Disaster Recovery Operations with Multi-Region Clusters with RPO=0 and RTO=0 Stream Data Globally with Replication and Cluster Linking 41
  42. 42. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Yes. But it does not replace other databases! Can replace a Database?
  43. 43. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? TL;DR • Kafka can store data forever in a durable and high available manner providing ACID guarantees • Different options to query historical data are available in Kafka • Kafka-native add-ons like ksqlDB or Tiered Storage make Kafka more powerful than ever before for processing data in motion and event-based long-term storage • Stateful applications can be built leveraging Kafka clients (microservices, business applications) without the need for another external database • Not a replacement for existing databases like MySQL, MongoDB, Elasticsearch or Hadoop • Other databases and Kafka complement each other; the right solution has to be selected for a problem; often purpose-built materialized views are created and updated in real time from the central event-based infrastructure • Different options are available for bi-directional pull and push based integration between Kafka and databases to complement each other
  44. 44. @KaiWaehner - www.kai-waehner.de – Can Apache Kafka Replace a Database? Kai Waehner Field CTO contact@kai-waehner.de @KaiWaehner www.kai-waehner.de www.confluent.io linkedin.com/in/kaiwaehner Questions? Feedback? Let’s connect!

×