Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain 2018)

4,959 views

Published on

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Talk at Big Data Spain 2018 in Madrid).

- KSQL includes access to the rich Apache Kafka ecosystem and is suitable for various use cases, including Streaming ETL, Real Time Stream Monitoring and Anomaly Detection

- KSQL allows to realize stream processing without coding and without additional analytics cluster

Description:

The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master.
KSQL is an open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. Even though it is simple to use, KSQL is built for mission-critical and scalable production deployments (using Kafka Streams under the hood).
Benefits of using KSQL include: No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem. This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.

Published in: Technology

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain 2018)

  1. 1. KSQL The Open Source Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  2. 2. 2Confidential 1.0 Enterprise Ready J A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Data processing (Streams API) 0.9 Data integration (Connect API) Intra-cluster replication 0.8 2012 2014 Cluster mirroring0.7 2015 2016 20172013 2018 CP 4.1 KSQL GA
  3. 3. 3Confidential KSQL – The Streaming SQL Engine for Apache Kafka
  4. 4. 4KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java Kafka Streams KSQL
  5. 5. 5KSQL- Streaming SQL for Apache Kafka Shoulders of Streaming Giants subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … KSQL UDFs
  6. 6. 6KSQL- Streaming SQL for Apache Kafka KSQL for Data Exploration and Debugging An easy way to inspect your data in Kafka SHOW TOPICS; SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; PRINT 'my-topic' FROM BEGINNING;
  7. 7. 7KSQL- Streaming SQL for Apache Kafka KSQL for Data Transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Change number of partitions1 Convert data to JSON2 Repartition the data3
  8. 8. 8KSQL- Streaming SQL for Apache Kafka KSQL for Real-Time, Streaming ETL Filter, cleanse, process data while it is in motion CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Pick only VIP users1
  9. 9. 9KSQL- Streaming SQL for Apache Kafka Example: CDC from DB via Kafka to Elastic
  10. 10. 10KSQL- Streaming SQL for Apache Kafka KSQL for Real-time Data Enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS SELECT payment_id, c.country, total FROM payments_stream p LEFT JOIN customers_table c ON p.user_id = c.user_id; Stream-Stream Join2 Stream-Table Join1
  11. 11. 11KSQL- Streaming SQL for Apache Kafka Example: Retail
  12. 12. 12KSQL- Streaming SQL for Apache Kafka KSQL for Real-Time Monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 5; Now we know to alert, and whom1
  13. 13. 13KSQL- Streaming SQL for Apache Kafka Example: IoT, Automotive, Connected Cars streams
  14. 14. 14KSQL- Streaming SQL for Apache Kafka KSQL for Anomaly Detection Aggregate data to identify patterns and anomalies in real-time CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data1 … per 30-sec windows2
  15. 15. 15KSQL- Streaming SQL for Apache Kafka Example: Anomaly Detection with Deep Learning (Autoencoder) “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) https://github.com/kaiwaehner/ ksql-udf-deep-learning-mqtt-iot
  16. 16. 16KSQL- Streaming SQL for Apache Kafka Independent Dev / Test / Prod of different Apps and Microservices
  17. 17. 17KSQL- Streaming SQL for Apache Kafka No Matter Where it Runs
  18. 18. 18KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka and Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  19. 19. 19KSQL- Streaming SQL for Apache Kafka KSQL is Equally viable for S / M / L / XL / XXL use cases Ok. Ok. Ok. … and KSQL is ready for production, including 24/7 support!
  20. 20. 20KSQL- Streaming SQL for Apache Kafka Fault-Tolerance, powered by Kafka
  21. 21. 21KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  22. 22. 22KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries • TUMBLING • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  23. 23. 23KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The Engine which actually runs the Kafka Streams topologies 2. The REST server interface enables an Engine to receive instructions from the CLI or any other client 3. The CLI, designed to be familiar to users of MySQL, Postgres etc. (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  24. 24. 24KSQL- Streaming SQL for Apache Kafka KSQL can be used interactively + programmatically ksql> POST /query 1UI 2CLI 3REST 4Headless
  25. 25. 25KSQL- Streaming SQL for Apache Kafka Architecture (Client – Server Mode) JVM KSQL Server KSQL CLI or any REST Client JVM KSQL Server JVM KSQL Server Kafka Cluster
  26. 26. 26KSQL- Streaming SQL for Apache Kafka Architecture (Headless Mode) JVM KSQL Server JVM KSQL Server JVM KSQL Server Kafka Cluster
  27. 27. 27KSQL- Streaming SQL for Apache Kafka Dedicating resources Join Engines to the same ‘service pool’ by means of the ksql.service.id property
  28. 28. 28KSQL- Streaming SQL for Apache Kafka User Defined Functions (UDF, UDAF) Write UDF code in Java, mark with annotations @UdfDescription, @Udf. SELECT address, STRINGLENGTH(address->street) FROM orders; Make UDF available to KSQL (next slides), then use it like any other KSQL function in your queries: The UDF name in KSQL queries is whatever you define in the `name` field in the annotation (here: “stringLength”).
  29. 29. 29KSQL- Streaming SQL for Apache Kafka Live Demo KSQL in Action
  30. 30. 30KSQL- Streaming SQL for Apache Kafka KSQL Quick Start – Getting Started in Minutes! https://docs.confluent.io/ current/quickstart/index.html Local runtime or Docker container
  31. 31. 31KSQL- Streaming SQL for Apache Kafka Demo - Clickstream Analysis • https://docs.confluent.io/current/ksql/docs/tutorials/clickstream-docker.html#ksql-clickstream- docker • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  32. 32. 32KSQL- Streaming SQL for Apache Kafka KSQL Recipes https://www.confluent.io/stream-processing-cookbook
  33. 33. 33KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples KSQL is GA… You can already use it for production deployments! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  34. 34. KSQLis the Streaming SQL Engine for Apache Kafka
  35. 35. Questions? Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de

×