Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)

121 views

Published on

Speaker: Vish Srinivasan (https://www.linkedin.com/in/vishwanathsrinivasan/)

Video: https://www.youtube.com/channel/UC2698J-retd2cw1VZZUnLHw

Talk presented during Bangalore Kafka group's meetup at Near
(https://www.meetup.com/Bangalore-Apache-Kafka-Group/events/267874122/)


Published in: Technology
  • Be the first to comment

  • Be the first to like this

Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)

  1. 1. 1 Stream Processing fundamentals and Introduction to ksqlDB Vish Srinivasan Systems Engineer
  2. 2. 2 Agenda ● Motivation for Event Streaming ~5 ● Kafka 101 ~10 ● Stream Processing ~20 1. Kafka Streams 2. KSQL ● ksqlDB - An introduction ~10 ● Q&A ~10
  3. 3. 33 Event Streaming Motivation
  4. 4. 44 The Central Challenge Connecting applications with data ETL What happened in the world Messaging What is happening in the world
  5. 5. 5 Motivation
  6. 6. 6
  7. 7. 7 STATEEVENT > I changed my job from Snaplogic to Confluent in April 2019 I work at Confluent.
  8. 8. 8 JOB CHANGE RECOMMENDATION ENGINE SEARCH INDEX EMAIL SERVICE
  9. 9. 9 IS MORE SOFTWARE THE USER OF THE SOFTWARE
  10. 10. 1010 Customers Expect Rich Digital Experiences ● Real-Time combined with historical data ● Only Event Streaming Platforms can do this When will my driver get here?
  11. 11. 1111 Event-Driven App (Location Tracking) Only Real-Time Events Messaging Queues and Event Streaming Platforms can do this Contextual Event-Driven App (ETA) Real-Time combined with stored data Only Event Streaming Platforms can do this Where is my driver? When will my driver get here? Where is my driver? When will my driver get here? 2 min Why Combine Real-time With Historical Context? VS.
  12. 12. 12 Contextual, Event-Driven Apps in the Enterprise “We look at events as running our business. Business people within our organization want to be able to react to events—and oftentimes it's a combination of events.” —Chris D’Agostino, VP of Streaming Data 01 Real-Time Fraud Notifications 03 Automated Transaction Analysis 02 Real-Time “Second Look”
  13. 13. 13 Take Away #1 Event Streaming Platforms let you build Contextual Event Driven Applications combining real time and historical data.
  14. 14. 14 An Event Streaming Platform gives you three key functionalities Publish & Subscribe to Events Store Events Process & Analyze Events
  15. 15. 1515 … But first, Kafka Basics
  16. 16. 16 Kafka is a Foundation for Event Streams 0 1 2 3 4 5 6 7 8LOG READS WRITES DESTINATION SYSTEM A DESTINATION SYSTEM B
  17. 17. 17 BROKER 1 BROKER 2 BROKER 3 BROKER 4 TOPIC 1-PART 1 Storage: Distributed and Replicated TOPIC 2-PART 2 TOPIC 2-PART 1 TOPIC 1-PART 2 TOPIC 1-PART 1 TOPIC 2-PART 2 TOPIC 2-PART 1 TOPIC 1-PART 2 TOPIC 1-PART 1 TOPIC 2-PART 2 TOPIC 2-PART 1 TOPIC 1-PART 2 2 topics, 2 partitions each, 3 replicas each PRODUCER CONSUMER
  18. 18. 18 Producing to Kafka
  19. 19. 19 Messages will be produced in a round robin fashion Written to leader of a partitions Producing to Kafka Time 1 2 3 4 5
  20. 20. 20 A B C D hash(key) % numPartitions = N Producing to Kafka with a Key Time
  21. 21. 21 Consuming from Kafka
  22. 22. 22 C Consuming with Single Client
  23. 23. 23 C C C C Consuming with Consumer Groups Logical Name Load balanced across all consumers in the group
  24. 24. 24 C CCC CG1 CC CG2 Consuming with Consumer Groups
  25. 25. 25 Delivery Guarantees ● Producer Guarantees ○ Acks = 0 ○ Acks = 1 ○ Acks = all ● Consumer Guarantees ○ At least once ○ At most once ○ Exactly once
  26. 26. 26 Take Away #2 Kafka lets you publish/subscribe to events as well as store events.
  27. 27. 27 An Event Streaming Platform gives you three key functionalities Publish & Subscribe to Events Store Events Process & Analyze Events
  28. 28. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  29. 29. 29 Event Transformation with Stream Processing streams The streaming SQL engine for Apache Kafka® CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; Apache Kafka® library to write real-time applications and microservices in Java and Scala Confluent KSQL You write only SQL. No Java, Python, or other boilerplate to wrap around it!
  30. 30. 30 subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … Shoulders of Streaming Giants KSQL UDFs
  31. 31. 31 Processing Layer (KSQL, KStreams) 31 00100 11101 11000 00011 00100 00110Topic alice Paris bob Sydney alice RomeStream plus schema (serdes) alice Rome bob Sydney Table plus aggregation Storage Layer (Brokers) Topics vs. Streams and Tables
  32. 32. 32 “The ledger of Vish’s sales.” “Vish’s sales totals.” “California sales totals.” Streams record history Tables represent state
  33. 33. 33 1. e4 e5 2. Nf3 Nc6 3. Bc4 Bc5 4. d3 Nf6 5. Nbd2 “The sequence of moves.” “The state of the board.” Streams record history Tables represent state
  34. 34. 34 Another analogy: Behavioral psychology
  35. 35. 35 ● Processing is partitioned ● Unit of parallelism is stream-task Streams topic with schema Tables underlying topic (usually) compacted ● Materialized view, cannot be mutated ● Implemented on top of a state-store (mutable)
  36. 36. 36 Take Away #3 2 tools to process data: Kafka Streams and KSQL 2 concepts in both: Streams and Tables.
  37. 37. 3737 … Now, ksqlDB
  38. 38. 38 KSQL for Real-Time Monitoring ● Log data monitoring ● Tracking and alerting ● Syslog data ● Sensor / IoT data ● Application metrics CREATE STREAM syslog_invalid_users AS SELECT host, message FROM syslog WHERE message LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting
  39. 39. 39 KSQL for Anomaly Detection ● Identify patterns or anomalies in real- time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3;
  40. 40. 40 KSQL for Streaming ETL ● Joining, filtering, and aggregating streams of event data CREATE STREAM vip_actions AS SELECT user_id, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level = 'Platinum';
  41. 41. 41 KSQL is a stream processing technology As such it is not yet a great fit for: Ad-hoc queries ● No indexes yet in KSQL ● Kafka often configured to retain data for only a limited span of time BI reports (Tableau etc.) ● No indexes yet in KSQL ● No JDBC ● Most BI tools don’t understand continuous, streaming results
  42. 42. 42 PUSH PULL APP Jay’s credit score is 670 Jay’s credit score is 710 Jay’s credit score is 695 What is Jay’s credit score now? 695 APP
  43. 43. 43 PUSH PULL SELECT user, credit_score FROM credit_history WHERE ROWKEY = ‘jay’ EMIT CHANGES; SELECT user, credit_score FROM credit_history WHERE ROWKEY = ‘jay’;
  44. 44. 44 ksqlDB adds two key features to augment KSQL PULL QUERIES ● Point-in-time lookup of information ● Comparable to a SELECT statement in a relational database EMBEDDED CONNECTORS ● Move event data to and from external data systems ● Available for all supported connectors 21 APPPULL $25 How much does Jay’s ride cost? CONNECTOR CONNECTOR ksqlDB CONNECTOR
  45. 45. 46 So, What use cases is ksqlDB a good fit for? It does not replace traditional databases: ● What is a database? ● Materialize events into an opinionated structure (table) so you get power of SQL ● When we query, We are querying the state produced by the processor executing the commit log - we just recreated materialized views.
  46. 46. 47 So, What use cases is ksqlDB a good fit for? ksqlDB is primarily useful for three broad categories of applications: ● Building and serving materialized views that power apps ● Creating real-time streaming apps that react to event streams and trigger side effects ● Creating real-time streaming pipelines that continuously transform event streams
  47. 47. 48 Summary Takeaways ● Event Streaming Platforms let you build Contextual Event Driven Applications combining real time and historical data. ● Kafka lets you publish/subscribe to events and also store them ● Process data with Kafka Streams or KSQL using Streams and Tables ● ksqlDB makes it easy to build and serve materialized views that power apps
  48. 48. 49 Thank You! Reach out if you have any questions: ● Vish Srinivasan - vish@confluent.io Community Slack: https://launchpass.com/confluentcommunity Learn Kafka - https://kafka-tutorials.confluent.io/ ksqlDB - https://ksqldb.io/

×