Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL

8,841 views

Published on

Stream Processing is a concept used to act on real-time streaming data. This session shows and demos how teams in different industries leverage the innovative Streams API from Apache Kafka to build and deploy mission-critical streaming real time application and microservices.

The session discusses important Streaming concepts like local and distributed state management, exactly once semantics, embedding streaming into any application, deployment to any infrastructure. Afterwards, the session explains key advantages of Kafka's Streams API like distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events so you can recalculate output when your code changes.

The session also introduces KSQL - the Streaming SQL Engine for Apache Kafka. Write SQL streaming queries with the scalability, throughput and fail-over of Kafka Streams under the hood.

The end of the session demos how to combine any custom code with your streams application (either Kafka Streams or KSQL) by an example using an analytic model built with any machine learning framework like Apache Spark ML or TensorFlow.

Published in: Technology

Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL

  1. 1. R ET HINKING Stream Processing With Apache Kafka, Kafka Streams and KSQL Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  2. 2. 2
  3. 3. 3
  4. 4. 4 Apache Kafka A Distributed, Scalable Commit Log
  5. 5. 5 Apache Kafka A Distributed, Scalable Commit Log
  6. 6. 6 1.0 Enterprise Ready 0.10 Data Processing (Streams API) 0.11 Exactly-once Semantics 2013 2014 2015 2016 2017 2018 0.8 Intra-cluster replication 0.9 Data Integration (Connect API) Apache Kafka The Rise of a Streaming Platform KSQL
  7. 7. 7 Apache Kafka The Rise of a Streaming Platform
  8. 8. 8 Key concepts
  9. 9. 9 Key concepts
  10. 10. 10 Moving away from the Monolith
  11. 11. 11 Microservices (aka Apps) Orders Service Basket Service Payment Service Fulfillment Service Stock Service
  12. 12. 12 Microservices are Independently Deployable Orders Service Basket Service Payment Service Fulfillment Service Stock Service
  13. 13. 13 Scale in Infrastructure Terms
  14. 14. 14 Scale in People Terms
  15. 15. 15 As developers, we want to build APPS not (distributed) INFRASTRUCTURE
  16. 16. 16 Independent Dev / Test / Prod
  17. 17. 17 We want our apps to be: Scalable Elastic Fault-tolerant Stateful Distributed
  18. 18. 18 Where do I put my compute?
  19. 19. 19 No Matter Where it Runs
  20. 20. 20 Where do I put my state?
  21. 21. 21 State Management
  22. 22. 22 The actual question is Where is my code?
  23. 23. 23 The KAFKA STREAMS API is a JAVA API to BUILD REAL-TIME APPLICATIONS to POWER THE BUSINESS
  24. 24. 24 App Streams API Not running inside brokers!
  25. 25. 25 Brokers? Nope! App Kafka Streams API App Kafka Streams API App Kafka Streams API Same app, many instances
  26. 26. 26 Before DashboardProcessing Cluster Your Job Shared Database
  27. 27. 27 After Dashboard APP Kafka Streams API
  28. 28. 28 this means you can DEPLOY your app ANYWHERE using WHATEVER TECHNOLOGY YOU WANT
  29. 29. 29 Kafka Streams is Equally viable for S / M / L / XL / XXL use cases Ok. Ok. Ok.
  30. 30. 30 Things Kafka Streams Does Runs everywhere Scale and fail-over Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes
  31. 31. 31 Some API CONCEPTS
  32. 32. 32 STREAMS are EVERYWHERE
  33. 33. 33 TABLES are EVERYWHERE
  34. 34. 34 STREAMS <-> TABLES
  35. 35. 35
  36. 36. 36 // Example: reading data from Kafka KStream<byte[], String> textLines = builder.stream("textlines-topic", Consumed.with( Serdes.ByteArray(), Serdes.String())); // Example: transforming data KStream<byte[], String> upperCasedLines= rawRatings.mapValues(String::toUpperCase)); KStream
  37. 37. 37 // Example: aggregating data KTable<String, Long> wordCounts = textLines .flatMapValues(textLine -> Arrays.asList(textLine.toLowerCase().split("W+"))) .groupBy((key, word) -> word) .count(); KTable
  38. 38. 38 State Management Pluggable State Store. Default Strategy: - In-memory (fast access) - Local disc (for fast recovery) - Replicated to Kafka (for resilience) https://www.infoq.com/presentations/kafka-streams-spring-cloud
  39. 39. 39 Kafka Streams A complete streaming microservices, ready for production at large-scale App configuration Define processing (here: WordCount) Start processing
  40. 40. 40 DEMO
  41. 41. 41 Use Case: Airline Flight Delay Prediction Machine Learning Algorithm: Neural Network built with H2O and TensorFlow Streaming Platform: Apache Kafka and Kafka Streams Live Demo
  42. 42. 42 H2O.ai Model + Kafka Streams Filter Map 1) Create H2O Deep Learning model 2) Configure Kafka Streams Application 3) Apply H2O model to Streaming Data 4) Start Kafka Streams App
  43. 43. 43
  44. 44. 44 What if you are NOT a Java Coder? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java Java KSQL
  45. 45. 45 KSQLis the Streaming SQL Engine for Apache Kafka
  46. 46. 46 KSQL – The Streaming SQL Engine for Apache Kafka
  47. 47. 47 Trade-Offs • subscribe() • poll() • send() • flush() • mapValues() • filter() • punctuate() • Select…from… • Join…where… • Group by.. Flexibility Simplicity Kafka Streams KSQL Consumer Producer
  48. 48. 48 What is it for ? Streaming ETL • Kafka is popular for data pipelines • KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  49. 49. 49 What is it for ? Analytics, e.g. Anomaly Detection • Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTES) GROUP BY card_number HAVING count(*) > 3;
  50. 50. 50 What is it for ? Real Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  51. 51. 51 KSQL is Equally viable for S / M / L / XL / XXL use cases Ok. Ok. Ok.
  52. 52. 52 Live Demo – KSQL Hello World
  53. 53. 53 https://www.confluent.io/press-release/confluent-makes-ksql-available-confluent-platform-announces-general-availability/
  54. 54. 54 Remember, we want to build APPS not INFRASTRUCTURE
  55. 55. 55 Leverage KAFKA STREAMS API or KSQL to BUILD REAL-TIME APPLICATIONS to POWER THE BUSINESS
  56. 56. Questions? Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de

×