SlideShare a Scribd company logo
KSQL AND KAFKA
REAL TIME STREAM PROCESSING WITH
DAVID PETERSON
Systems Engineer - Confluent APAC
@davidseth
Changing Architectures
Kafka?
Stream Processing
KSQL
KSQL in Production
QUICK INTRO TO CONFLUENT
69% of active Kafka Committers
Founded

September 2014
Technology developed 

while at LinkedIn
Founded by the creators of
Apache Kafka
76%of Kafka code created 

by Confluent team
Changing
Architectures
Events
A Sale An Invoice A Trade A Customer
Experience
DATA FLOW
CHANGING ARCHITECTURES
WE ARE CHALLENGING OLD ASSUMPTIONS...
Stream Data is

The Faster the Better
Big Data was

The More the Better
ValueofData
Volume of Data
ValueofData
Age of Data
CHANGING ARCHITECTURES
WE ARE CHALLENGING OLD ARCHITECTURES…
Lambda
Big OR Fast
Speed Table Batch Table
DB
Streams Hadoop
Kappa 

Big AND Fast
KSQL Stream
Kafka
HDFSCassandra Elastic
Topic A
Micro-
service
A CHANGE OF MINDSET...
KAFKA: EVENT CENTRIC THINKING
A CHANGE OF MINDSET...
AN EVENT-DRIVEN ENTERPRISE
● Everything is an event
● Available instantly to all applications 

in a company
● Ability to query data as it arrives vs 

when it is too late
● Simplifying the data architecture by 

deploying a single platform
What are the possibilities?
It’s a massively scalable
distributed, fault tolerant,
publish & subscribe
key/value datastore with
infinite data retention
computing unbounded,
streaming data in real time.
It’s a massively scalable
distributed, fault tolerant,
publish & subscribe
key/value datastore with
infinite data retention
computing unbounded,
streaming data in real time.
So, what is Kafka really?
It’s made up of 3 key primitives
Store Process
Publish &
Subscribe
So, what is Kafka really?
Producer &
Consumer API
Connect API Streams API
Open-source client
libraries for numerous
languages. Direct
integration with your
systems.
Reliable and scalable
integration of Kafka
with other systems –
no coding required.
Low-level and DSL,
create applications &
microservices

to process your data
in real-time
Confidential 25
1.0
One<dot>Oh
release!
A Brief History of Apache Kafka and Confluent
0.11
Exactly-once
semantics
0.10
Stream
processing
0.9
Data
integration
Intra-cluster

replication
0.8
2012 2014
0.7
2015 2016 20172013 2018
CP 4.1

KSQL GA
2.0
☺
26
Producers
Kafka
cluster
Consumers
So, what exactly is a stream?
1. TOPIC
{“actor”:”bear”,
“x”:410, “y”:20}
{“actor”:”racoon”,
“x”:380, “y”:20}
{“actor”:”bear”,
“x”:380, “y”:22}
{“actor”:”racoon”,
“x”:350, “y”:22}
{“actor”:”bear”,
“x”:350, “y”:25}
{“actor”:”racoon”,
“x”:330, “y”:25}
{“actor”:”racoon”,
“x”:280, “y”:32}
{“actor”:”bear”,
“x”:310, “y”:32}
2.STREAM
3.TABLE
Exposure Sheet
Real Time stream processing with KSQL and Kafka SEP / API DAYS
46
Changelog stream – immutable events
Real Time stream processing with KSQL and Kafka SEP / API DAYS
47
Rebuild original table
Stream Processing
KSQL- Streaming SQL for Apache Kafka
Confluent – Looking Forward J U L Y

50
Standard App
No need to create a separate cluster
Highly scaleable, elastic, fault tolerant
Confluent – Looking Forward J U L Y
51
Lives inside your application
Stream processing
Real Time stream processing with KSQL and Kafka SEP / API DAYS
52
Same data, but different use cases


“Alice has been to SFO, NYC, Rio, Sydney,

Beijing, Paris, and finally Berlin.”
“Alice is in SFO, NYC, Rio, Sydney,

Beijing, Paris, Berlin right now.”
⚑ ⚑
⚑⚑
⚑
⚑
⚑ ⚑ ⚑
⚑⚑
⚑
⚑
⚑
Use case 1: Frequent traveler status? Use case 2: Current location?
KStream KTable
KSQL
Real Time stream processing with KSQL and Kafka SEP / API DAYS
54
KSQL — get started fast with Stream Processing
Kafka

(data)
KSQL

(processing)
read,
write
network
All you need is Kafka – no complex deployments of
bespoke systems for stream processing!
CREATE STREAM
CREATE TABLE
SELECT …and more…
Confluent – Looking Forward J U L Y
55
● No need for source code deployment
○ Zero, none at all, not even one tiny file
● All the Kafka Streams capabilities out-of-
the-box
○ Exactly Once Semantics
○ Windowing
○ Event-time aggregation
○ Late-arriving data
○ Distributed, fault-tolerant, scalable, ...
KSQL Concepts
Real Time stream processing with KSQL and Kafka SEP / API DAYS
56
KSQL — SELECT statement syntax
SELECT `select_expr` [, ...]

FROM `from_item` [, ...]

[ WINDOW `window_expression` ]

[ WHERE `condition` ]

[ GROUP BY `grouping expression` ]

[ HAVING `having_expression` ]
[ LIMIT n ]

where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
KSQL
are some
what
use cases?
10+5
Real Time stream processing with KSQL and Kafka SEP / API DAYS
58
KSQL — Data exploration
An easy way to inspect data in Kafka
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
SHOW TOPICS;
PRINT 'my-topic' FROM BEGINNING;
Real Time stream processing with KSQL and Kafka SEP / API DAYS
59
KSQL — Data enrichment
Join data from a variety of sources to see the full picture
CREATE STREAM enriched_payments AS 

SELECT payment_id, u.country, total
FROM payments_stream p
LEFT JOIN users_table u
ON p.user_id = u.user_id;
Stream-table join
Real Time stream processing with KSQL and Kafka SEP / API DAYS
60
KSQL — Streaming ETL
Filter, cleanse, process data while it is moving
CREATE STREAM clicks_from_vip_users AS 

SELECT user_id, u.country, page, action
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id 

WHERE u.level ='Platinum';
Real Time stream processing with KSQL and Kafka SEP / API DAYS
61
KSQL — Anomaly Detection
CREATE TABLE possible_fraud AS

SELECT card_number, COUNT(*)

FROM authorization_attempts 

WINDOW TUMBLING (SIZE 5 MINUTE)

GROUP BY card_number

HAVING COUNT(*) > 3;
… per 5 min windows
Aggregate data
Aggregate data to identify patterns or anomalies in real-time
TIME BUCKETS
STREAMING
TUMBLING
HOPPING
SESSION
Real Time stream processing with KSQL and Kafka SEP / API DAYS
66
KSQL — Real time monitoring
Derive insights from events (IoT, sensors, etc.) and turn them into actions
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE event_type = 'ERROR’
GROUP BY vehicle
HAVING COUNT(*) >= 3;
Real Time stream processing with KSQL and Kafka SEP / API DAYS
67
KSQL — Data transformation
Quickly make derivations of existing data in Kafka
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’

VALUE_FORMAT='JSON') AS 

SELECT * FROM clickstream
PARTITION BY user_id; Re-key the data
Convert data to JSON
Real Time stream processing with KSQL and Kafka SEP / API DAYS
68
KSQL — Stream to Stream JOINs
Example: Detect late orders by matching every SHIPMENTS row with ORDERS rows that are within a 2-
hour window.
CREATE STREAM late_orders AS

SELECT o.orderid, o.itemid FROM orders o
FULL OUTER JOIN shipments s WITHIN 2 HOURS
ON s.orderid = o.orderid WHERE s.orderid IS NULL;
Real Time stream processing with KSQL and Kafka SEP / API DAYS
69
INSERT INTO statement for Streams
CREATE STREAM sales_online (itemId BIGINT, price INTEGER, shipmentId BIGINT) WITH (...);

CREATE STREAM sales_offline (itemId BIGINT, price INTEGER, storeId BIGINT) WITH (...);

CREATE STREAM all_sales (itemId BIGINT, price INTEGER) WITH (...);



-- Merge the streams into `all_sales`
INSERT INTO all_sales SELECT itemId, price FROM sales_online;

INSERT INTO all_sales SELECT itemId, price FROM sales_offline;


CREATE TABLE daily_sales_per_item AS

SELECT itemId, SUM(price) FROM all_sales

WINDOW TUMBLING (SIZE 1 DAY) GROUP BY itemId;



Example: Compute daily sales per item across online and offline stores
Real Time stream processing with KSQL and Kafka SEP / API DAYS
70
KSQL — Demo
customers
Kafka Connect

streams data in
Kafka Connect

streams data out
KSQL processes
table changes
in real-time
Producer
Real Time stream processing with KSQL and Kafka SEP / API DAYS
72
KSQL — Deep Learning for IoT Sensor Analytics
KSQL UDF using an analytic model under the hood
→ Write once, use in any KSQL statement
SELECT event_id
anomaly(SENSORINPUT) 

FROM health_sensor;
User Defined Function
Real Time stream processing with KSQL and Kafka SEP / API DAYS
73
KSQL — User
Defined
Function (UDF)
Putting KSQL into
Production
DEPLOYING
KSQL
CLI
REST
CODE
Server A:

“I do stateful stream

processing, like tables,

joins, aggregations.”
“streaming

restore” of

A’s local state to B
Changelog Topic
“streaming

backup” of

A’s local state
KSQL
Kafka
A key challenge of distributed stream processing is fault-tolerant state.
State is automatically migrated

in case of server failure
Server B:

“I restore the state and

continue processing where

server A stopped.”
Fault-Tolerance, powered by Kafka
Processing fails over automatically, without data loss or miscomputation.
1 Kafka consumer group

rebalance is triggered
2 Processing and state of #3

is migrated via Kafka to

remaining servers #1 + #2
3 Kafka consumer group

rebalance is triggered
4 Part of processing incl.

state is migrated via Kafka

from #1 + #2 to server #3
#3 is back so the work is split again#3 died so #1 and #2 take over
Fault-Tolerance, powered by Kafka
You can add, remove, restart servers in KSQL clusters during live operations.
1 Kafka consumer group

rebalance is triggered
2 Part of processing incl.

state is migrated via Kafka

to additional server processes
“We need more processing power!”
Kafka consumer group

rebalance is triggered
3
4 Processing incl. state of

stopped servers is migrated

via Kafka to remaining servers
“Ok, we can scale down again.”
Elasticity and Scalability, powered by Kafka
PARALLELISATION
PARALLELISATION
KSQLis the
Streaming
SQL Engine
for
Apache Kafka
Real Time stream processing with KSQL and Kafka SEP / API DAYS
83
Resources and Next Steps
• Try the demo on GitHub :)
• Check out the code
• Play with the examples
Download Confluent Open Source: https://www.confluent.io/download/
Chat with us: https://slackpass.io/confluentcommunity #ksql
https://github.com/confluentinc/demo-scene
KSQL- Streaming SQL for Apache Kafka
Confluent – Looking Forward J U L Y

84
The World’s Best Streaming Platform — Everywhere
DAVID PETERSON
Systems Engineer - Confluent APAC
@davidseth

More Related Content

What's hot

Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Simplilearn
 
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_Singapore
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_SingaporeCI-CD with AWS Developer Tools and Fargate_AWSPSSummit_Singapore
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_Singapore
Amazon Web Services
 

What's hot (20)

Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Test automation proposal
Test automation proposalTest automation proposal
Test automation proposal
 
DevSecOps: Key Controls to Modern Security Success
DevSecOps: Key Controls to Modern Security SuccessDevSecOps: Key Controls to Modern Security Success
DevSecOps: Key Controls to Modern Security Success
 
Enterprise Governance: Build Your AWS Landing Zone (ENT351-R1) - AWS re:Inven...
Enterprise Governance: Build Your AWS Landing Zone (ENT351-R1) - AWS re:Inven...Enterprise Governance: Build Your AWS Landing Zone (ENT351-R1) - AWS re:Inven...
Enterprise Governance: Build Your AWS Landing Zone (ENT351-R1) - AWS re:Inven...
 
Protecting Agile Transformation through Secure DevOps (DevSecOps)
Protecting Agile Transformation through Secure DevOps (DevSecOps)Protecting Agile Transformation through Secure DevOps (DevSecOps)
Protecting Agile Transformation through Secure DevOps (DevSecOps)
 
Scaling DevSecOps Culture for Enterprise
Scaling DevSecOps Culture for EnterpriseScaling DevSecOps Culture for Enterprise
Scaling DevSecOps Culture for Enterprise
 
Test automation process
Test automation processTest automation process
Test automation process
 
Security Process in DevSecOps
Security Process in DevSecOpsSecurity Process in DevSecOps
Security Process in DevSecOps
 
Integrating Jira Software Cloud With the AWS Code Suite
Integrating Jira Software Cloud With the AWS Code SuiteIntegrating Jira Software Cloud With the AWS Code Suite
Integrating Jira Software Cloud With the AWS Code Suite
 
AWS solution Architect Associate study material
AWS solution Architect Associate study materialAWS solution Architect Associate study material
AWS solution Architect Associate study material
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
 
Azure Application Modernization
Azure Application ModernizationAzure Application Modernization
Azure Application Modernization
 
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-2 | Big Data Interview Questions ...
 
Practical DevSecOps Course - Part 1
Practical DevSecOps Course - Part 1Practical DevSecOps Course - Part 1
Practical DevSecOps Course - Part 1
 
Shift left - find defects earlier through automated test and deployment
Shift left - find defects earlier through automated test and deploymentShift left - find defects earlier through automated test and deployment
Shift left - find defects earlier through automated test and deployment
 
Azure DevOps
Azure DevOpsAzure DevOps
Azure DevOps
 
Everything you want to know about microservices
Everything you want to know about microservicesEverything you want to know about microservices
Everything you want to know about microservices
 
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_Singapore
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_SingaporeCI-CD with AWS Developer Tools and Fargate_AWSPSSummit_Singapore
CI-CD with AWS Developer Tools and Fargate_AWSPSSummit_Singapore
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journey
 

Similar to Real-Time Stream Processing with KSQL and Apache Kafka

Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
confluent
 

Similar to Real-Time Stream Processing with KSQL and Apache Kafka (20)

Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
Use Apache Gradle to Build and Automate KSQL and Kafka Streams (Stewart Bryso...
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 

Real-Time Stream Processing with KSQL and Apache Kafka

  • 1. KSQL AND KAFKA REAL TIME STREAM PROCESSING WITH
  • 2. DAVID PETERSON Systems Engineer - Confluent APAC @davidseth
  • 4. QUICK INTRO TO CONFLUENT 69% of active Kafka Committers Founded
 September 2014 Technology developed 
 while at LinkedIn Founded by the creators of Apache Kafka
  • 5. 76%of Kafka code created 
 by Confluent team
  • 7. Events A Sale An Invoice A Trade A Customer Experience
  • 9. CHANGING ARCHITECTURES WE ARE CHALLENGING OLD ASSUMPTIONS... Stream Data is
 The Faster the Better Big Data was
 The More the Better ValueofData Volume of Data ValueofData Age of Data
  • 10. CHANGING ARCHITECTURES WE ARE CHALLENGING OLD ARCHITECTURES… Lambda Big OR Fast Speed Table Batch Table DB Streams Hadoop Kappa 
 Big AND Fast KSQL Stream Kafka HDFSCassandra Elastic Topic A Micro- service
  • 11. A CHANGE OF MINDSET... KAFKA: EVENT CENTRIC THINKING
  • 12. A CHANGE OF MINDSET... AN EVENT-DRIVEN ENTERPRISE ● Everything is an event ● Available instantly to all applications 
 in a company ● Ability to query data as it arrives vs 
 when it is too late ● Simplifying the data architecture by 
 deploying a single platform What are the possibilities?
  • 13.
  • 14. It’s a massively scalable distributed, fault tolerant, publish & subscribe key/value datastore with infinite data retention computing unbounded, streaming data in real time.
  • 15. It’s a massively scalable distributed, fault tolerant, publish & subscribe key/value datastore with infinite data retention computing unbounded, streaming data in real time.
  • 16. So, what is Kafka really?
  • 17. It’s made up of 3 key primitives
  • 19. So, what is Kafka really?
  • 20.
  • 21. Producer & Consumer API Connect API Streams API Open-source client libraries for numerous languages. Direct integration with your systems. Reliable and scalable integration of Kafka with other systems – no coding required. Low-level and DSL, create applications & microservices
 to process your data in real-time
  • 22. Confidential 25 1.0 One<dot>Oh release! A Brief History of Apache Kafka and Confluent 0.11 Exactly-once semantics 0.10 Stream processing 0.9 Data integration Intra-cluster
 replication 0.8 2012 2014 0.7 2015 2016 20172013 2018 CP 4.1
 KSQL GA 2.0 ☺
  • 24. So, what exactly is a stream?
  • 25.
  • 26.
  • 28.
  • 29.
  • 30.
  • 36.
  • 37.
  • 38.
  • 39.
  • 42. Real Time stream processing with KSQL and Kafka SEP / API DAYS 46 Changelog stream – immutable events
  • 43. Real Time stream processing with KSQL and Kafka SEP / API DAYS 47 Rebuild original table
  • 44.
  • 46. KSQL- Streaming SQL for Apache Kafka Confluent – Looking Forward J U L Y
 50 Standard App No need to create a separate cluster Highly scaleable, elastic, fault tolerant
  • 47. Confluent – Looking Forward J U L Y 51 Lives inside your application Stream processing
  • 48. Real Time stream processing with KSQL and Kafka SEP / API DAYS 52 Same data, but different use cases 
 “Alice has been to SFO, NYC, Rio, Sydney,
 Beijing, Paris, and finally Berlin.” “Alice is in SFO, NYC, Rio, Sydney,
 Beijing, Paris, Berlin right now.” ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ ⚑ ⚑ ⚑⚑ ⚑ ⚑ ⚑ Use case 1: Frequent traveler status? Use case 2: Current location? KStream KTable
  • 49. KSQL
  • 50. Real Time stream processing with KSQL and Kafka SEP / API DAYS 54 KSQL — get started fast with Stream Processing Kafka
 (data) KSQL
 (processing) read, write network All you need is Kafka – no complex deployments of bespoke systems for stream processing! CREATE STREAM CREATE TABLE SELECT …and more…
  • 51. Confluent – Looking Forward J U L Y 55 ● No need for source code deployment ○ Zero, none at all, not even one tiny file ● All the Kafka Streams capabilities out-of- the-box ○ Exactly Once Semantics ○ Windowing ○ Event-time aggregation ○ Late-arriving data ○ Distributed, fault-tolerant, scalable, ... KSQL Concepts
  • 52. Real Time stream processing with KSQL and Kafka SEP / API DAYS 56 KSQL — SELECT statement syntax SELECT `select_expr` [, ...]
 FROM `from_item` [, ...]
 [ WINDOW `window_expression` ]
 [ WHERE `condition` ]
 [ GROUP BY `grouping expression` ]
 [ HAVING `having_expression` ] [ LIMIT n ]
 where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 54. Real Time stream processing with KSQL and Kafka SEP / API DAYS 58 KSQL — Data exploration An easy way to inspect data in Kafka SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; SHOW TOPICS; PRINT 'my-topic' FROM BEGINNING;
  • 55. Real Time stream processing with KSQL and Kafka SEP / API DAYS 59 KSQL — Data enrichment Join data from a variety of sources to see the full picture CREATE STREAM enriched_payments AS 
 SELECT payment_id, u.country, total FROM payments_stream p LEFT JOIN users_table u ON p.user_id = u.user_id; Stream-table join
  • 56. Real Time stream processing with KSQL and Kafka SEP / API DAYS 60 KSQL — Streaming ETL Filter, cleanse, process data while it is moving CREATE STREAM clicks_from_vip_users AS 
 SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id 
 WHERE u.level ='Platinum';
  • 57. Real Time stream processing with KSQL and Kafka SEP / API DAYS 61 KSQL — Anomaly Detection CREATE TABLE possible_fraud AS
 SELECT card_number, COUNT(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 MINUTE)
 GROUP BY card_number
 HAVING COUNT(*) > 3; … per 5 min windows Aggregate data Aggregate data to identify patterns or anomalies in real-time
  • 58.
  • 61. Real Time stream processing with KSQL and Kafka SEP / API DAYS 66 KSQL — Real time monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 3;
  • 62. Real Time stream processing with KSQL and Kafka SEP / API DAYS 67 KSQL — Data transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’
 VALUE_FORMAT='JSON') AS 
 SELECT * FROM clickstream PARTITION BY user_id; Re-key the data Convert data to JSON
  • 63. Real Time stream processing with KSQL and Kafka SEP / API DAYS 68 KSQL — Stream to Stream JOINs Example: Detect late orders by matching every SHIPMENTS row with ORDERS rows that are within a 2- hour window. CREATE STREAM late_orders AS
 SELECT o.orderid, o.itemid FROM orders o FULL OUTER JOIN shipments s WITHIN 2 HOURS ON s.orderid = o.orderid WHERE s.orderid IS NULL;
  • 64. Real Time stream processing with KSQL and Kafka SEP / API DAYS 69 INSERT INTO statement for Streams CREATE STREAM sales_online (itemId BIGINT, price INTEGER, shipmentId BIGINT) WITH (...);
 CREATE STREAM sales_offline (itemId BIGINT, price INTEGER, storeId BIGINT) WITH (...);
 CREATE STREAM all_sales (itemId BIGINT, price INTEGER) WITH (...);
 
 -- Merge the streams into `all_sales` INSERT INTO all_sales SELECT itemId, price FROM sales_online;
 INSERT INTO all_sales SELECT itemId, price FROM sales_offline; 
 CREATE TABLE daily_sales_per_item AS
 SELECT itemId, SUM(price) FROM all_sales
 WINDOW TUMBLING (SIZE 1 DAY) GROUP BY itemId;
 
 Example: Compute daily sales per item across online and offline stores
  • 65. Real Time stream processing with KSQL and Kafka SEP / API DAYS 70 KSQL — Demo customers Kafka Connect
 streams data in Kafka Connect
 streams data out KSQL processes table changes in real-time Producer
  • 66.
  • 67. Real Time stream processing with KSQL and Kafka SEP / API DAYS 72 KSQL — Deep Learning for IoT Sensor Analytics KSQL UDF using an analytic model under the hood → Write once, use in any KSQL statement SELECT event_id anomaly(SENSORINPUT) 
 FROM health_sensor; User Defined Function
  • 68. Real Time stream processing with KSQL and Kafka SEP / API DAYS 73 KSQL — User Defined Function (UDF)
  • 72. Server A:
 “I do stateful stream
 processing, like tables,
 joins, aggregations.” “streaming
 restore” of
 A’s local state to B Changelog Topic “streaming
 backup” of
 A’s local state KSQL Kafka A key challenge of distributed stream processing is fault-tolerant state. State is automatically migrated
 in case of server failure Server B:
 “I restore the state and
 continue processing where
 server A stopped.” Fault-Tolerance, powered by Kafka
  • 73. Processing fails over automatically, without data loss or miscomputation. 1 Kafka consumer group
 rebalance is triggered 2 Processing and state of #3
 is migrated via Kafka to
 remaining servers #1 + #2 3 Kafka consumer group
 rebalance is triggered 4 Part of processing incl.
 state is migrated via Kafka
 from #1 + #2 to server #3 #3 is back so the work is split again#3 died so #1 and #2 take over Fault-Tolerance, powered by Kafka
  • 74. You can add, remove, restart servers in KSQL clusters during live operations. 1 Kafka consumer group
 rebalance is triggered 2 Part of processing incl.
 state is migrated via Kafka
 to additional server processes “We need more processing power!” Kafka consumer group
 rebalance is triggered 3 4 Processing incl. state of
 stopped servers is migrated
 via Kafka to remaining servers “Ok, we can scale down again.” Elasticity and Scalability, powered by Kafka
  • 78. Real Time stream processing with KSQL and Kafka SEP / API DAYS 83 Resources and Next Steps • Try the demo on GitHub :) • Check out the code • Play with the examples Download Confluent Open Source: https://www.confluent.io/download/ Chat with us: https://slackpass.io/confluentcommunity #ksql https://github.com/confluentinc/demo-scene
  • 79. KSQL- Streaming SQL for Apache Kafka Confluent – Looking Forward J U L Y
 84 The World’s Best Streaming Platform — Everywhere DAVID PETERSON Systems Engineer - Confluent APAC @davidseth