Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1Confidential
KSQL
An Open Source Streaming SQL Engine for Apache Kafka
Cliff Gilmore
Sr. Systems Engineer
cliff@confluent...
2Confidential
What is Kafka?
3Confidential
The Vision for Kafka 1.0
4Confidential
Streaming Data vs Big Data
Stream Data is
The Faster the Better
Stream Data can be
Big or Fast (Lambda)
Stre...
5Confidential
Kafka Architecture – Think of a Log!
6Confidential
Kafka Architecture
7Confidential
Kafka Streams
8Confidential
The Streams API of Apache Kafka™
✓ No separate processing cluster required
✓ Develop on Mac, Linux, Windows
...
9Confidential
Architecture
10Confidential
Architecture
KSQL
12Confidential
Kafka Stream Processing Evolution
13Confidential
Consumer
, Producer
Kafka
Streams
KSQL
Flexibility Simplicity
subscribe(),
poll(), send(),
flush()
mapValue...
14Confidential
Why KSQL?
• Expand access to Kafka Stream Processing to more people
• More accessible
• Less intimidating
•...
15Confidential
On the Shoulders of (Streaming) Giants
• Native, 100%-compatible Kafka integration
• Secure stream processi...
16Confidential
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data...
17Confidential
Schema & Format
●Start with message (value) format
● JSON - the simplest choice
● DELIMITED - in this previ...
18Confidential
Interactive Querying
● Great for iterative development
● LIST (or SHOW) STREAMS / TABLES
● DESCRIBE STREAM ...
19Confidential
What is it for ?
● Streaming ETL
○ Kafka is popular for data pipelines.
○ KSQL enables easy transformations...
20Confidential
Streaming
21Confidential
What is it for ?
● Anomaly Detection
○ Identifying patterns or anomalies in real-time data, surfaced in mil...
22Confidential
What is it for ?
● Real Time Monitoring
○ Log data monitoring, tracking and alerting
○ Sensor / IoT data
CR...
23Confidential
What is it for ?
● Simple Derivations of Existing Topics
○ One-liner to re-partition and/or re-key a topic ...
24Confidential
KSQL Components
• CLI
• Designed to be familiar to users of MySQL, Postgres, etc
• Engine
• Actually runs t...
25Confidential
How to run KSQL - #1 Stand-alone aka ‘local mode’
• Starts a CLI, an Engine, and a REST server all in the s...
26Confidential
How to run KSQL - #2 Client-Server
• Start any number of Server nodes
• > bin/ksql-server-start
• Start any...
27Confidential
How to run KSQL - #3 as an Application
How do you deploy applications
?
28Confidential
How to run KSQL - #3 as an Application
• Start any number of Engine instances
• Pass a file of KSQL stateme...
29Confidential
Try it out!
●https://github.com/confluentinc/ksql
●> bin/ksql-cli local
30Confidential
Questions? Feedback?
Please contact me!
Cliff Gilmore
Sr. Systems Engineer
cliff@confluent.io
Upcoming SlideShare
Loading in …5
×

Chicago Kafka Meetup

215 views

Published on

KSQL Intro

Published in: Software
  • You might also like this slide 'Apache Kafka vs MapR-ES: Fit for purpose/Decision tree': https://www.slideshare.net/sbaltagi/apache-kafka-vs-mapres-fit-for-purposedecision-tree
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Chicago Kafka Meetup

  1. 1. 1Confidential KSQL An Open Source Streaming SQL Engine for Apache Kafka Cliff Gilmore Sr. Systems Engineer cliff@confluent.io
  2. 2. 2Confidential What is Kafka?
  3. 3. 3Confidential The Vision for Kafka 1.0
  4. 4. 4Confidential Streaming Data vs Big Data Stream Data is The Faster the Better Stream Data can be Big or Fast (Lambda) Stream Data will be Big AND Fast (Kappa) Apache Kafka is the Enabling Technology of this Transition Big Data was The More the Better ValueofData Age of Data Speed Table Batch Table Database Streams Hadoop Job 1 Job 2 Streams Table 1 Table 2 Database ValueofData Volume of Data
  5. 5. 5Confidential Kafka Architecture – Think of a Log!
  6. 6. 6Confidential Kafka Architecture
  7. 7. 7Confidential Kafka Streams
  8. 8. 8Confidential The Streams API of Apache Kafka™ ✓ No separate processing cluster required ✓ Develop on Mac, Linux, Windows ✓ Deploy to containers, VMs, bare metal, cloud ✓ Powered by Kafka: elastic, scalable, distributed, battle-tested ✓ Perfect for small, medium, large use cases ✓ Fully integrated with Kafka security ✓ Exactly-once processing semantics ✓ Part of Apache Kafka, included in Confluent Open Source Write standard Java applications and microservices to process your data in real-time KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); http://kafka.apache.org/documentation/streams
  9. 9. 9Confidential Architecture
  10. 10. 10Confidential Architecture
  11. 11. KSQL
  12. 12. 12Confidential Kafka Stream Processing Evolution
  13. 13. 13Confidential Consumer , Producer Kafka Streams KSQL Flexibility Simplicity subscribe(), poll(), send(), flush() mapValues(), filter(), punctuate() Select…from … join…where… group by..
  14. 14. 14Confidential Why KSQL? • Expand access to Kafka Stream Processing to more people • More accessible • Less intimidating • Lower the barriers to entry Benefits • Enable stream processing with zero coding required • The simplest way to process streams of data in real-time • Powered by Kafka: scalable, distributed, battle-tested
  15. 15. 15Confidential On the Shoulders of (Streaming) Giants • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Supports late-arriving and out-of-order data • Windowing • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees
  16. 16. 16Confidential KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) • Changelog ● STREAM – TABLE Joins
  17. 17. 17Confidential Schema & Format ●Start with message (value) format ● JSON - the simplest choice ● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be expanded. ● AVRO - requires that you also supply a schema-file (.avsc), Schema Registry support soon! ●Pseudo-columns are automatically provided • ROWKEY, ROWTIME - for querying the message key and timestamp • (PARTITION, OFFSET coming soon) • CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH (value_format = 'delimited', kafka_topic='my_pageview_topic');
  18. 18. 18Confidential Interactive Querying ● Great for iterative development ● LIST (or SHOW) STREAMS / TABLES ● DESCRIBE STREAM / TABLE ● SELECT • Selects rows from a KSQL stream or table. • The result of this statement will be printed out in the console. • To stop the continuous query in the CLI press Ctrl+C.
  19. 19. 19Confidential What is it for ? ● Streaming ETL ○ Kafka is popular for data pipelines. ○ KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  20. 20. 20Confidential Streaming
  21. 21. 21Confidential What is it for ? ● Anomaly Detection ○ Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  22. 22. 22Confidential What is it for ? ● Real Time Monitoring ○ Log data monitoring, tracking and alerting ○ Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  23. 23. 23Confidential What is it for ? ● Simple Derivations of Existing Topics ○ One-liner to re-partition and/or re-key a topic for new uses CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id;
  24. 24. 24Confidential KSQL Components • CLI • Designed to be familiar to users of MySQL, Postgres, etc • Engine • Actually runs the Kafka Streams topologies • REST Server • HTTP interface allows an Engine to receive instructions from the CLI
  25. 25. 25Confidential How to run KSQL - #1 Stand-alone aka ‘local mode’ • Starts a CLI, an Engine, and a REST server all in the same JVM • Ideal for laptop development • Start with default settings: > bin/ksql-cli local • Or with customized settings: > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  26. 26. 26Confidential How to run KSQL - #2 Client-Server • Start any number of Server nodes • > bin/ksql-server-start • Start any number of CLIs and specify ‘remote’ server address • >bin/ksql-cli remote http://myserver:8090 • All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  27. 27. 27Confidential How to run KSQL - #3 as an Application How do you deploy applications ?
  28. 28. 28Confidential How to run KSQL - #3 as an Application • Start any number of Engine instances • Pass a file of KSQL statements to execute > bin/ksql-node foo/bar.sql • Ideal for streaming ETL application deployment • Version control your queries and transformations as code • All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  29. 29. 29Confidential Try it out! ●https://github.com/confluentinc/ksql ●> bin/ksql-cli local
  30. 30. 30Confidential Questions? Feedback? Please contact me! Cliff Gilmore Sr. Systems Engineer cliff@confluent.io

×