Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-Time Stream Processing with KSQL and Apache Kafka

1,313 views

Published on

Real Time Stream Processing with KSQL and Kafka
David Peterson, Confluent APAC
APIdays Melbourne 2018

Unordered, unbounded and massive datasets are increasingly common in day-to-day business. Using this to your advantage is incredibly difficult with current system designs. We are stuck in a model where we can only take advantage of this *after* it has happened. Many times, this is too late to be useful in the enterprise.

KSQL is a streaming SQL engine for Apache Kafka. KSQL lowers the entry bar to the world of stream processing, providing a simple and completely interactive SQL interface for processing data in Kafka. KSQL (like Kafka) is open-source, distributed, scalable, and reliable.

A real time Kafka platform moves your data up the stack, closer to the heart of your business, allowing you to build scalable, mission-critical services by quickly deploying SQL-like queries in a severless pattern.

This talk will highlight key use cases for real time data, and stream processing with KSQL: Real time analytics, security and anomaly detection, real time ETL / data integration, Internet of Things, application development, and deploying Machine Learning models with KSQ.

Real time data and stream processing means that Kafka is just as important to the disrupted as it is to the disruptors.

Published in: Technology

Real-Time Stream Processing with KSQL and Apache Kafka

  1. 1. KSQL AND KAFKA REAL TIME STREAM PROCESSING WITH
  2. 2. DAVID PETERSON Systems Engineer - Confluent APAC @davidseth
  3. 3. Changing Architectures Kafka? Stream Processing KSQL KSQL in Production
  4. 4. QUICK INTRO TO CONFLUENT 69% of active Kafka Committers Founded
 September 2014 Technology developed 
 while at LinkedIn Founded by the creators of Apache Kafka
  5. 5. 76%of Kafka code created 
 by Confluent team
  6. 6. Changing Architectures
  7. 7. Events A Sale An Invoice A Trade A Customer Experience
  8. 8. DATA FLOW
  9. 9. CHANGING ARCHITECTURES WE ARE CHALLENGING OLD ASSUMPTIONS... Stream Data is
 The Faster the Better Big Data was
 The More the Better ValueofData Volume of Data ValueofData Age of Data
  10. 10. CHANGING ARCHITECTURES WE ARE CHALLENGING OLD ARCHITECTURES… Lambda Big OR Fast Speed Table Batch Table DB Streams Hadoop Kappa 
 Big AND Fast KSQL Stream Kafka HDFSCassandra Elastic Topic A Micro- service
  11. 11. A CHANGE OF MINDSET... KAFKA: EVENT CENTRIC THINKING
  12. 12. A CHANGE OF MINDSET... AN EVENT-DRIVEN ENTERPRISE ● Everything is an event ● Available instantly to all applications 
 in a company ● Ability to query data as it arrives vs 
 when it is too late ● Simplifying the data architecture by 
 deploying a single platform What are the possibilities?
  13. 13. It’s a massively scalable distributed, fault tolerant, publish & subscribe key/value datastore with infinite data retention computing unbounded, streaming data in real time.
  14. 14. It’s a massively scalable distributed, fault tolerant, publish & subscribe key/value datastore with infinite data retention computing unbounded, streaming data in real time.
  15. 15. So, what is Kafka really?
  16. 16. It’s made up of 3 key primitives
  17. 17. Store Process Publish & Subscribe
  18. 18. So, what is Kafka really?
  19. 19. Producer & Consumer API Connect API Streams API Open-source client libraries for numerous languages. Direct integration with your systems. Reliable and scalable integration of Kafka with other systems – no coding required. Low-level and DSL, create applications & microservices
 to process your data in real-time
  20. 20. Confidential 25 1.0 One<dot>Oh release! A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Stream processing 0.9 Data integration Intra-cluster
 replication 0.8 2012 2014 0.7 2015 2016 20172013 2018 CP 4.1
 KSQL GA 2.0 ☺
  21. 21. 26 Producers Kafka cluster Consumers
  22. 22. So, what exactly is a stream?
  23. 23. 1. TOPIC
  24. 24. {“actor”:”bear”, “x”:410, “y”:20} {“actor”:”racoon”, “x”:380, “y”:20}
  25. 25. {“actor”:”bear”, “x”:380, “y”:22} {“actor”:”racoon”, “x”:350, “y”:22}
  26. 26. {“actor”:”bear”, “x”:350, “y”:25} {“actor”:”racoon”, “x”:330, “y”:25}
  27. 27. {“actor”:”racoon”, “x”:280, “y”:32} {“actor”:”bear”, “x”:310, “y”:32}
  28. 28. 2.STREAM
  29. 29. 3.TABLE
  30. 30. Exposure Sheet
  31. 31. Real Time stream processing with KSQL and Kafka SEP / API DAYS 46 Changelog stream – immutable events
  32. 32. Real Time stream processing with KSQL and Kafka SEP / API DAYS 47 Rebuild original table
  33. 33. Stream Processing
  34. 34. KSQL- Streaming SQL for Apache Kafka Confluent – Looking Forward J U L Y
 50 Standard App No need to create a separate cluster Highly scaleable, elastic, fault tolerant
  35. 35. Confluent – Looking Forward J U L Y 51 Lives inside your application Stream processing
  36. 36. Real Time stream processing with KSQL and Kafka SEP / API DAYS 52 Same data, but different use cases 
 “Alice has been to SFO, NYC, Rio, Sydney,
 Beijing, Paris, and finally Berlin.” “Alice is in SFO, NYC, Rio, Sydney,
 Beijing, Paris, Berlin right now.” ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ Use case 1: Frequent traveler status? Use case 2: Current location? KStream KTable
  37. 37. KSQL
  38. 38. Real Time stream processing with KSQL and Kafka SEP / API DAYS 54 KSQL — get started fast with Stream Processing Kafka
 (data) KSQL
 (processing) read, write network All you need is Kafka – no complex deployments of bespoke systems for stream processing! CREATE STREAM CREATE TABLE SELECT …and more…
  39. 39. Confluent – Looking Forward J U L Y 55 ● No need for source code deployment ○ Zero, none at all, not even one tiny file ● All the Kafka Streams capabilities out-of- the-box ○ Exactly Once Semantics ○ Windowing ○ Event-time aggregation ○ Late-arriving data ○ Distributed, fault-tolerant, scalable, ... KSQL Concepts
  40. 40. Real Time stream processing with KSQL and Kafka SEP / API DAYS 56 KSQL — SELECT statement syntax SELECT `select_expr` [, ...]
 FROM `from_item` [, ...]
 [ WINDOW `window_expression` ]
 [ WHERE `condition` ]
 [ GROUP BY `grouping expression` ]
 [ HAVING `having_expression` ] [ LIMIT n ]
 where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  41. 41. KSQL are some what use cases? 10+5
  42. 42. Real Time stream processing with KSQL and Kafka SEP / API DAYS 58 KSQL — Data exploration An easy way to inspect data in Kafka SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; SHOW TOPICS; PRINT 'my-topic' FROM BEGINNING;
  43. 43. Real Time stream processing with KSQL and Kafka SEP / API DAYS 59 KSQL — Data enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS 
 SELECT payment_id, u.country, total FROM payments_stream p LEFT JOIN users_table u ON p.user_id = u.user_id; Stream-table join
  44. 44. Real Time stream processing with KSQL and Kafka SEP / API DAYS 60 KSQL — Streaming ETL Filter, cleanse, process data while it is moving CREATE STREAM clicks_from_vip_users AS 
 SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id 
 WHERE u.level ='Platinum';
  45. 45. Real Time stream processing with KSQL and Kafka SEP / API DAYS 61 KSQL — Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, COUNT(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 MINUTE)
 GROUP BY card_number
 HAVING COUNT(*) > 3; … per 5 min windows Aggregate data Aggregate data to identify patterns or anomalies in real-time
  46. 46. TIME BUCKETS STREAMING
  47. 47. TUMBLING HOPPING SESSION
  48. 48. Real Time stream processing with KSQL and Kafka SEP / API DAYS 66 KSQL — Real time monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 3;
  49. 49. Real Time stream processing with KSQL and Kafka SEP / API DAYS 67 KSQL — Data transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’
 VALUE_FORMAT='JSON') AS 
 SELECT * FROM clickstream PARTITION BY user_id; Re-key the data Convert data to JSON
  50. 50. Real Time stream processing with KSQL and Kafka SEP / API DAYS 68 KSQL — Stream to Stream JOINs Example: Detect late orders by matching every SHIPMENTS row with ORDERS rows that are within a 2- hour window. CREATE STREAM late_orders AS
 SELECT o.orderid, o.itemid FROM orders o FULL OUTER JOIN shipments s WITHIN 2 HOURS ON s.orderid = o.orderid WHERE s.orderid IS NULL;
  51. 51. Real Time stream processing with KSQL and Kafka SEP / API DAYS 69 INSERT INTO statement for Streams CREATE STREAM sales_online (itemId BIGINT, price INTEGER, shipmentId BIGINT) WITH (...);
 CREATE STREAM sales_offline (itemId BIGINT, price INTEGER, storeId BIGINT) WITH (...);
 CREATE STREAM all_sales (itemId BIGINT, price INTEGER) WITH (...);
 
 -- Merge the streams into `all_sales` INSERT INTO all_sales SELECT itemId, price FROM sales_online;
 INSERT INTO all_sales SELECT itemId, price FROM sales_offline; 
 CREATE TABLE daily_sales_per_item AS
 SELECT itemId, SUM(price) FROM all_sales
 WINDOW TUMBLING (SIZE 1 DAY) GROUP BY itemId;
 
 Example: Compute daily sales per item across online and offline stores
  52. 52. Real Time stream processing with KSQL and Kafka SEP / API DAYS 70 KSQL — Demo customers Kafka Connect
 streams data in Kafka Connect
 streams data out KSQL processes table changes in real-time Producer
  53. 53. Real Time stream processing with KSQL and Kafka SEP / API DAYS 72 KSQL — Deep Learning for IoT Sensor Analytics KSQL UDF using an analytic model under the hood → Write once, use in any KSQL statement SELECT event_id anomaly(SENSORINPUT) 
 FROM health_sensor; User Defined Function
  54. 54. Real Time stream processing with KSQL and Kafka SEP / API DAYS 73 KSQL — User Defined Function (UDF)
  55. 55. Putting KSQL into Production
  56. 56. DEPLOYING KSQL
  57. 57. CLI REST CODE
  58. 58. Server A:
 “I do stateful stream
 processing, like tables,
 joins, aggregations.” “streaming
 restore” of
 A’s local state to B Changelog Topic “streaming
 backup” of
 A’s local state KSQL Kafka A key challenge of distributed stream processing is fault-tolerant state. State is automatically migrated
 in case of server failure Server B:
 “I restore the state and
 continue processing where
 server A stopped.” Fault-Tolerance, powered by Kafka
  59. 59. Processing fails over automatically, without data loss or miscomputation. 1 Kafka consumer group
 rebalance is triggered 2 Processing and state of #3
 is migrated via Kafka to
 remaining servers #1 + #2 3 Kafka consumer group
 rebalance is triggered 4 Part of processing incl.
 state is migrated via Kafka
 from #1 + #2 to server #3 #3 is back so the work is split again#3 died so #1 and #2 take over Fault-Tolerance, powered by Kafka
  60. 60. You can add, remove, restart servers in KSQL clusters during live operations. 1 Kafka consumer group
 rebalance is triggered 2 Part of processing incl.
 state is migrated via Kafka
 to additional server processes “We need more processing power!” Kafka consumer group
 rebalance is triggered 3 4 Processing incl. state of
 stopped servers is migrated
 via Kafka to remaining servers “Ok, we can scale down again.” Elasticity and Scalability, powered by Kafka
  61. 61. PARALLELISATION
  62. 62. PARALLELISATION
  63. 63. KSQLis the Streaming SQL Engine for Apache Kafka
  64. 64. Real Time stream processing with KSQL and Kafka SEP / API DAYS 83 Resources and Next Steps • Try the demo on GitHub :) • Check out the code • Play with the examples Download Confluent Open Source: https://www.confluent.io/download/ Chat with us: https://slackpass.io/confluentcommunity #ksql https://github.com/confluentinc/demo-scene
  65. 65. KSQL- Streaming SQL for Apache Kafka Confluent – Looking Forward J U L Y
 84 The World’s Best Streaming Platform — Everywhere DAVID PETERSON Systems Engineer - Confluent APAC @davidseth

×