Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Paris jug ksql - 2018-06-28

311 views

Published on

KSQL, Kafka streams

Published in: Software
  • Be the first to comment

Paris jug ksql - 2018-06-28

  1. 1. KSQLOpen-source streaming for Apache Kafka
  2. 2. let's look at Kafka Streams First! But ...is a
  3. 3. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer Flexibility Simplicity
  4. 4. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams Flexibility Simplicity
  5. 5. Before DashboardProcessing Cluster Your Job Shared Database
  6. 6. After Dashboard APP Streams API
  7. 7. Things Kafka Streams Does Runs everywhere Clustering done for you Exactly-once processing Event-time processing Integrated database Joins, windowing, aggregation S/M/L/XL/XXL/XXXL sizes
  8. 8. A Familiar DSL
  9. 9. A Kafka story! https://github.com/framiere/a-kafka-story
  10. 10. No Panic, it's a walktrough! https://github.com/framiere/a-kafka-story/tree/master/step6
  11. 11. Expressive ... but verbose
  12. 12. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  13. 13. Declarative Stream Language Processing KSQLis a
  14. 14. KSQLis the Streaming SQL Engine for Apache Kafka
  15. 15. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  16. 16. KSQL are some what use cases?
  17. 17. KSQL for Data Exploration SELECT status, bytes FROM clickstream WHERE user_agent = ‘Mozilla/5.0 (compatible; MSIE 6.0)’; An easy way to inspect data in a running cluster
  18. 18. KSQL for Streaming ETL • Kafka is popular for data pipelines. • KSQL enables easy transformations of data within the pipe. • Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum';
  19. 19. KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  20. 20. KSQL implements for you the Kafka Stream application you would have implemented if you had • ... the time • ... the experience • ... the KSQL as a spec • ... the willingness to do boring code
  21. 21. KSQL for Data Transformation CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS 
 SELECT * FROM clickstream PARTITION BY user_id; Make simple derivations of existing topics from the command line
  22. 22. Demo fun https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
  23. 23. Where is KSQL not such a great fit? BI reports (Tableau etc.) • No indexes • No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries • Limited span of time usually retained in Kafka • No indexes
  24. 24. Do you think that’s a table you are querying ?
  25. 25. Stream/Table Duality
  26. 26. alice 1 alice 1 charlie 1 alice 2 charlie 1 alice 2 charlie 1 bob 1 TABLE STREAM TABLE (“alice”, 1) (“charlie”, 1) (“alice”, 2) (“bob”, 1) alice 1 alice 1 charlie 1 alice 2 charlie 1 alice 2 charlie 1 bob 1
  27. 27. CREATE STREAM clickstream ( time BIGINT, url VARCHAR, status INTEGER, bytes INTEGER, userid VARCHAR, agent VARCHAR) WITH ( value_format = ‘JSON’, kafka_topic=‘my_clickstream_topic’ ); Creating a Stream
  28. 28. CREATE TABLE users ( user_id INTEGER, registered_at LONG, username VARCHAR, name VARCHAR, city VARCHAR, level VARCHAR) WITH ( key = ‘user_id', kafka_topic=‘clickstream_users’, value_format=‘JSON'); Creating a Table
  29. 29. CREATE STREAM vip_actions AS SELECT userid, fullname, url, status 
 FROM clickstream c 
 LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; Joins for Enrichment
  30. 30. Trade-Offs • subscribe() • poll() • send() • flush() Consumer, Producer • filter() • join() • aggregate() Kafka Streams • Select…from… • Join…where… • Group by.. KSQL Flexibility Simplicity
  31. 31. Kafka Cluster JVM KSQL ServerKSQL CLI #1 STAND-ALONE AKA ‘LOCAL MODE’ How to run KSQL
  32. 32. How to run KSQL JVM KSQL Server KSQL CLI JVM KSQL Server JVM KSQL Server Kafka Cluster #2 CLIENT-SERVER
  33. 33. How to run KSQL Kafka Cluster JVM KSQL Server JVM KSQL Server JVM KSQL Server #3 AS A STANDALONE APPLICATION
  34. 34. How to run KSQL Kafka Cluster #4 EMBEDDED IN AN APPLICATION JVM App Instance KSQL Engine Application Code JVM App Instance KSQL Engine Application Code JVM App Instance KSQL Engine Application Code
  35. 35. Confluent 5.0 - Nested Types SELECT userid, address.city FROM users WHERE address.state = 'CA' https://github.com/confluentinc/ksql/pull/1114
  36. 36. Confluent 5.0 - Remaining joins SELECT orderid, shipmentid FROM orders INNER JOIN shipments ON order.id = shipmentid;
  37. 37. Confluent 5.0 - User defined functions SELECT address, stringLength(address->street) from orders;
  38. 38. KSQL Capacity Planning – Sizing https://docs.confluent.io/current/ksql/docs/capacity-planning.html
  39. 39. Do you demo! https://tinyurl.com/ksql-parisjug
  40. 40. Control Center
  41. 41. Control Center
  42. 42. Control Center
  43. 43. Resources – Community Slack and Mailing List https://slackpass.io/confluentcommunity https://groups.google.com/forum/#!forum/confluent-platform
  44. 44. Resources and Next Steps https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  45. 45. Links • https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/ • https://gist.github.com/rmoff/7efa882dfd808dbab4eb7b8e6f9eda16 • https://asciinema.org/a/nNfYKsVFwzOKMIWfdj5L16uzW • https://github.com/confluentinc/ksql/blob/master/ksql-examples/src/main/java/io/confluent/ksql/datagen/ DataGen.java • https://github.com/confluentinc/ksql/blob/master/ksql-parser/src/main/antlr4/io/confluent/ksql/parser/SqlBase.g4 • https://github.com/confluentinc/ksql/blob/v5.1.0-beta201806200051/ksql-engine/src/main/java/io/confluent/ksql/ KsqlContext.java • https://github.com/mmolimar/ksql-jdbc-driver • https://twitter.com/alexott_en/status/1008384806693097472 • https://github.com/framiere/a-kafka-story • https://docs.confluent.io/current/ksql/docs/syntax-reference.html • https://github.com/confluentinc/cp-demo • https://github.com/confluentinc/cp-ansible • https://www.confluent.io/blog/introducing-the-confluent-operator-apache-kafka-on-kubernetes/ • https://www.confluent.io/blog/author/robin/
  46. 46. Links http://bit.ly/ksql-parisjug
  47. 47. @framiere florent@confluent.io

×