SlideShare a Scribd company logo
1 of 25
KSQLOpen source streaming for Apache Kafka®
@tlberglund
Kafka Architecture
consumer
producer
consumer consumer
broker
broker
broker
broker
producer
Scalable Consumption
consumer
group A
producer
consumer
group A
consumer
group B
consumer
group B
…
…
…
partition 1
partition 2
partition 3
Partitioned Topic
Logs and Pub/Sub
consumer A
producer
consumer B
8764321 5
first record
latest record
Streaming Platform
Applications
Databases
Offline SystemsDWH
hdfs/
spark
Stream
Processors
Real-time
analytics Streaming Platform
Declarative
Stream
Language
Processing
KSQLis a
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
KSQL
are some
what
use cases?
KSQL for Data Exploration
SELECT status, bytes
FROM clickstream
WHERE user_agent =
‘Mozilla/5.0 (compatible; MSIE 6.0)’;
An easy way to inspect data in a running cluster
KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS 

SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id 

WHERE u.level = 'Platinum';
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS

SELECT card_number, count(*)

FROM authorization_attempts 

WINDOW TUMBLING (SIZE 5 SECONDS)

GROUP BY card_number

HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds
KSQL for Real-Time Monitoring
• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS 

SELECT error_code, count(*) 

FROM monitoring_stream 

WINDOW TUMBLING (SIZE 1 MINUTE) 

WHERE type = 'ERROR' 

GROUP BY error_code;
KSQL for Data Transformation
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS 

SELECT * FROM clickstream PARTITION BY user_id;
Make simple derivations of existing topics from the command line
Where is KSQL not such a great fit?
BI reports (Tableau etc.)
• No indexes
• No JDBC (most BI tools are not
good with continuous results!)
Ad-hoc queries
• Limited span of time usually
retained in Kafka
• No indexes
CREATE STREAM clickstream (
time BIGINT,
url VARCHAR,
status INTEGER,
bytes INTEGER,
userid VARCHAR,
agent VARCHAR)
WITH (
value_format = ‘JSON’,
kafka_topic=‘my_clickstream_topic’
);
Creating a Stream
CREATE TABLE users (
user_id INTEGER,
registered_at LONG,
username VARCHAR,
name VARCHAR,
city VARCHAR,
level VARCHAR)
WITH (
key = ‘user_id',
kafka_topic=‘clickstream_users’,
value_format=‘JSON');
Creating a Table
CREATE STREAM vip_actions AS
SELECT userid, fullname, url, status 

FROM clickstream c 

LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
Joins for Enrichment
Demo
How to run KSQL
JVM
KSQL Server
KSQL CLI
JVM
KSQL Server
JVM
KSQL Server
Kafka Cluster
#2 CLIENT-SERVER
• Start any number of server nodes
bin/ksql-server-start
• Start one or more CLIs and point them to a server
bin/ksql-cli remote https://myksqlserver:8090
• All servers share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
How to run KSQL#2 CLIENT-SERVER
How to run KSQL
Kafka Cluster
JVM
KSQL Server
JVM
KSQL Server
JVM
KSQL Server
#3 AS A STANDALONE APPLICATION
• Start any number of server nodes
Pass a file of KSQL statement to execute
bin/ksql-node query-file=foo/bar.sql
• Ideal for streaming ETL application deployment
Version-control your queries and transformations as code
• All running engines share the processing load
Technically, instances of the same Kafka Streams Applications
Scale up/down without restart
How to run KSQL#3 AS A STANDALONE APPLICATION
Resources and Next Steps
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
Thank you!
@tlberglund tim@confluent.io

More Related Content

What's hot

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeWinWire Technologies Inc
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019Anton Babenko
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoDatabricks
 
Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019Anton Babenko
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 

What's hot (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
infrastructure as code
infrastructure as codeinfrastructure as code
infrastructure as code
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Getting Started with Infrastructure as Code
Getting Started with Infrastructure as CodeGetting Started with Infrastructure as Code
Getting Started with Infrastructure as Code
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
 
Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019Terraform Best Practices - DevOps Unicorns 2019
Terraform Best Practices - DevOps Unicorns 2019
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Terraform
TerraformTerraform
Terraform
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Terraform 101
Terraform 101Terraform 101
Terraform 101
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 

Similar to KSQL: Open Source Streaming for Apache Kafka

Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLFlorent Ramiere
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Michael Noll
 
Exploring KSQL Patterns
Exploring KSQL Patterns Exploring KSQL Patterns
Exploring KSQL Patterns confluent
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
KSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for KafkaKSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for KafkaKnoldus Inc.
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Michael Noll
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshopconfluent
 

Similar to KSQL: Open Source Streaming for Apache Kafka (20)

Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
 
Exploring KSQL Patterns
Exploring KSQL Patterns Exploring KSQL Patterns
Exploring KSQL Patterns
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
KSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for KafkaKSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for Kafka
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 

More from confluent

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024TopCSSGallery
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 

Recently uploaded (20)

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 

KSQL: Open Source Streaming for Apache Kafka

  • 1. KSQLOpen source streaming for Apache Kafka® @tlberglund
  • 3. Scalable Consumption consumer group A producer consumer group A consumer group B consumer group B … … … partition 1 partition 2 partition 3 Partitioned Topic
  • 4. Logs and Pub/Sub consumer A producer consumer B 8764321 5 first record latest record
  • 8. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  • 10. KSQL for Data Exploration SELECT status, bytes FROM clickstream WHERE user_agent = ‘Mozilla/5.0 (compatible; MSIE 6.0)’; An easy way to inspect data in a running cluster
  • 11. KSQL for Streaming ETL • Kafka is popular for data pipelines. • KSQL enables easy transformations of data within the pipe. • Transforming data while moving from Kafka to another system. CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum';
  • 12. KSQL for Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds
  • 13. KSQL for Real-Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS 
 SELECT error_code, count(*) 
 FROM monitoring_stream 
 WINDOW TUMBLING (SIZE 1 MINUTE) 
 WHERE type = 'ERROR' 
 GROUP BY error_code;
  • 14. KSQL for Data Transformation CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS 
 SELECT * FROM clickstream PARTITION BY user_id; Make simple derivations of existing topics from the command line
  • 15. Where is KSQL not such a great fit? BI reports (Tableau etc.) • No indexes • No JDBC (most BI tools are not good with continuous results!) Ad-hoc queries • Limited span of time usually retained in Kafka • No indexes
  • 16. CREATE STREAM clickstream ( time BIGINT, url VARCHAR, status INTEGER, bytes INTEGER, userid VARCHAR, agent VARCHAR) WITH ( value_format = ‘JSON’, kafka_topic=‘my_clickstream_topic’ ); Creating a Stream
  • 17. CREATE TABLE users ( user_id INTEGER, registered_at LONG, username VARCHAR, name VARCHAR, city VARCHAR, level VARCHAR) WITH ( key = ‘user_id', kafka_topic=‘clickstream_users’, value_format=‘JSON'); Creating a Table
  • 18. CREATE STREAM vip_actions AS SELECT userid, fullname, url, status 
 FROM clickstream c 
 LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum'; Joins for Enrichment
  • 19. Demo
  • 20. How to run KSQL JVM KSQL Server KSQL CLI JVM KSQL Server JVM KSQL Server Kafka Cluster #2 CLIENT-SERVER
  • 21. • Start any number of server nodes bin/ksql-server-start • Start one or more CLIs and point them to a server bin/ksql-cli remote https://myksqlserver:8090 • All servers share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart How to run KSQL#2 CLIENT-SERVER
  • 22. How to run KSQL Kafka Cluster JVM KSQL Server JVM KSQL Server JVM KSQL Server #3 AS A STANDALONE APPLICATION
  • 23. • Start any number of server nodes Pass a file of KSQL statement to execute bin/ksql-node query-file=foo/bar.sql • Ideal for streaming ETL application deployment Version-control your queries and transformations as code • All running engines share the processing load Technically, instances of the same Kafka Streams Applications Scale up/down without restart How to run KSQL#3 AS A STANDALONE APPLICATION
  • 24. Resources and Next Steps http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql