SlideShare a Scribd company logo
1 of 30
Download to read offline
1Confidential
KSQL
An Open Source Streaming SQL Engine for Apache Kafka
Cliff Gilmore
Sr. Systems Engineer
cliff@confluent.io
2Confidential
What is Kafka?
3Confidential
The Vision for Kafka 1.0
4Confidential
Streaming Data vs Big Data
Stream Data is
The Faster the Better
Stream Data can be
Big or Fast (Lambda)
Stream Data will be
Big AND Fast (Kappa)
Apache Kafka is the Enabling Technology of this Transition
Big Data was
The More the Better
ValueofData
Age of Data Speed Table Batch Table
Database
Streams Hadoop
Job 1 Job 2
Streams
Table 1 Table 2
Database
ValueofData
Volume of Data
5Confidential
Kafka Architecture – Think of a Log!
6Confidential
Kafka Architecture
7Confidential
Kafka Streams
8Confidential
The Streams API of Apache Kafka™
✓ No separate processing cluster required
✓ Develop on Mac, Linux, Windows
✓ Deploy to containers, VMs, bare metal, cloud
✓ Powered by Kafka: elastic, scalable, distributed,
battle-tested
✓ Perfect for small, medium, large use cases
✓ Fully integrated with Kafka security
✓ Exactly-once processing semantics
✓ Part of Apache Kafka, included in
Confluent Open Source
Write standard Java applications and microservices
to process your data in real-time
KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");
KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");
http://kafka.apache.org/documentation/streams
9Confidential
Architecture
10Confidential
Architecture
KSQL
12Confidential
Kafka Stream Processing Evolution
13Confidential
Consumer
, Producer
Kafka
Streams
KSQL
Flexibility Simplicity
subscribe(),
poll(), send(),
flush()
mapValues(),
filter(),
punctuate()
Select…from
…
join…where…
group by..
14Confidential
Why KSQL?
• Expand access to Kafka Stream Processing to more people
• More accessible
• Less intimidating
• Lower the barriers to entry
Benefits
• Enable stream processing with zero coding required
• The simplest way to process streams of data in real-time
• Powered by Kafka: scalable, distributed, battle-tested
15Confidential
On the Shoulders of (Streaming) Giants
• Native, 100%-compatible Kafka integration
• Secure stream processing using Kafka’s security features
• Elastic and highly scalable
• Fault-tolerant
• Stateful and stateless computations
• Interactive queries
• Time model
• Supports late-arriving and out-of-order data
• Windowing
• Millisecond processing latency, no micro-batching
• At-least-once and exactly-once processing guarantees
16Confidential
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream
• One record per key (per window)
• Current values (compacted topic)
• Changelog
● STREAM – TABLE Joins
17Confidential
Schema & Format
●Start with message (value) format
● JSON - the simplest choice
● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be
expanded.
● AVRO - requires that you also supply a schema-file (.avsc), Schema Registry support soon!
●Pseudo-columns are automatically provided
• ROWKEY, ROWTIME - for querying the message key and timestamp
• (PARTITION, OFFSET coming soon)
• CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH
(value_format = 'delimited', kafka_topic='my_pageview_topic');
18Confidential
Interactive Querying
● Great for iterative development
● LIST (or SHOW) STREAMS / TABLES
● DESCRIBE STREAM / TABLE
● SELECT
• Selects rows from a KSQL stream or table.
• The result of this statement will be printed out in the console.
• To stop the continuous query in the CLI press Ctrl+C.
19Confidential
What is it for ?
● Streaming ETL
○ Kafka is popular for data pipelines.
○ KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM
clickstream c LEFT JOIN users u ON
c.userid = u.user_id
WHERE u.level = 'Platinum';
20Confidential
Streaming
21Confidential
What is it for ?
● Anomaly Detection
○ Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5
SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
22Confidential
What is it for ?
● Real Time Monitoring
○ Log data monitoring, tracking and alerting
○ Sensor / IoT data
CREATE TABLE error_counts
AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1
MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
23Confidential
What is it for ?
● Simple Derivations of Existing Topics
○ One-liner to re-partition and/or re-key a topic for new uses
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT *
FROM clickstream
PARTITION BY user_id;
24Confidential
KSQL Components
• CLI
• Designed to be familiar to users of MySQL, Postgres, etc
• Engine
• Actually runs the Kafka Streams topologies
• REST Server
• HTTP interface allows an Engine to receive instructions from the CLI
25Confidential
How to run KSQL - #1 Stand-alone aka ‘local mode’
• Starts a CLI, an Engine, and a REST server all in the same JVM
• Ideal for laptop development
• Start with default settings:
> bin/ksql-cli local
• Or with customized settings:
> bin/ksql-cli local –-properties-file foo/bar/ksql.properties
26Confidential
How to run KSQL - #2 Client-Server
• Start any number of Server nodes
• > bin/ksql-server-start
• Start any number of CLIs and specify ‘remote’ server address
• >bin/ksql-cli remote http://myserver:8090
• All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart
27Confidential
How to run KSQL - #3 as an Application
How do you deploy applications
?
28Confidential
How to run KSQL - #3 as an Application
• Start any number of Engine instances
• Pass a file of KSQL statements to execute
> bin/ksql-node foo/bar.sql
• Ideal for streaming ETL application deployment
• Version control your queries and transformations as code
• All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart
29Confidential
Try it out!
●https://github.com/confluentinc/ksql
●> bin/ksql-cli local
30Confidential
Questions? Feedback?
Please contact me!
Cliff Gilmore
Sr. Systems Engineer
cliff@confluent.io

More Related Content

What's hot

Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...confluent
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...confluent
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And TechniquesKnoldus Inc.
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionAlexandre Tamborrino
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread confluent
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...confluent
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache PulsarMatteo Merli
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles BackupsScyllaDB
 

What's hot (20)

Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
kafka
kafkakafka
kafka
 
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
Kafka Summit NYC 2017 - Easy, Scalable, Fault-tolerant Stream Processing with...
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Deep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumptionDeep dive into Apache Kafka consumption
Deep dive into Apache Kafka consumption
 
kafka
kafkakafka
kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 

Similar to Chicago Kafka Meetup

Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentKinetica
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafkaconfluent
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Amazon Web Services
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
KSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for KafkaKSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for KafkaKnoldus Inc.
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationYi Pan
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
Cassandra
CassandraCassandra
Cassandraexsuns
 

Similar to Chicago Kafka Meetup (20)

KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and ConfluentWebinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
Serverless Analytics with Amazon Redshift Spectrum, AWS Glue, and Amazon Quic...
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
KSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for KafkaKSQL- Streaming Sql for Kafka
KSQL- Streaming Sql for Kafka
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
SamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentationSamzaSQL QCon'16 presentation
SamzaSQL QCon'16 presentation
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

openEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleopenEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleShane Coughlan
 
Steps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic DevelopersSteps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic Developersmichealwillson701
 
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...telebusocialmarketin
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...jackiepotts6
 
Einstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfEinstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfCloudMetic
 
8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.Ritesh Kanjee
 
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...MyFAA
 
Enterprise Content Managements Solutions
Enterprise Content Managements SolutionsEnterprise Content Managements Solutions
Enterprise Content Managements SolutionsIQBG inc
 
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurMinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurPriyadarshini T
 
MUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow ModelsMUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow ModelsUniversity of Antwerp
 
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptxCYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptxBarakaMuyengi
 
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfFlutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfMind IT Systems
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tipsmichealwillson701
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJpolinaucc
 
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
Unlocking AI:Navigating Open Source vs. Commercial FrontiersUnlocking AI:Navigating Open Source vs. Commercial Frontiers
Unlocking AI: Navigating Open Source vs. Commercial FrontiersRaphaël Semeteys
 
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Inc
 
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
BusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptxBusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptx
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptxAGATSoftware
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdfOffsiteNOC
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityRandy Shoup
 

Recently uploaded (20)

openEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scaleopenEuler Community Overview - a presentation showing the current scale
openEuler Community Overview - a presentation showing the current scale
 
Steps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic DevelopersSteps to Successfully Hire Ionic Developers
Steps to Successfully Hire Ionic Developers
 
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
Telebu Social -Whatsapp Business API : Mastering Omnichannel Business Communi...
 
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
03.2024_North America VMUG Optimizing RevOps using the power of ChatGPT in Ma...
 
Einstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdfEinstein Copilot Conversational AI for your CRM.pdf
Einstein Copilot Conversational AI for your CRM.pdf
 
8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.8 Steps to Build a LangChain RAG Chatbot.
8 Steps to Build a LangChain RAG Chatbot.
 
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
Take Advantage of Mx Tracking Flight Scheduling Solutions to Streamline Your ...
 
Enterprise Content Managements Solutions
Enterprise Content Managements SolutionsEnterprise Content Managements Solutions
Enterprise Content Managements Solutions
 
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young EntrepreneurMinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
MinionLabs_Mr. Gokul Srinivas_Young Entrepreneur
 
MUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow ModelsMUT4SLX: Extensions for Mutation Testing of Stateflow Models
MUT4SLX: Extensions for Mutation Testing of Stateflow Models
 
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptxCYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
CYBER SECURITY AND CYBER CRIME COMPLETE GUIDE.pLptx
 
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdfFlutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
Flutter the Future of Mobile App Development - 5 Crucial Reasons.pdf
 
Mobile App Development process | Expert Tips
Mobile App Development process | Expert TipsMobile App Development process | Expert Tips
Mobile App Development process | Expert Tips
 
Technical improvements. Reasons. Methods. Estimations. CJ
Technical improvements.  Reasons. Methods. Estimations. CJTechnical improvements.  Reasons. Methods. Estimations. CJ
Technical improvements. Reasons. Methods. Estimations. CJ
 
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
Unlocking AI:Navigating Open Source vs. Commercial FrontiersUnlocking AI:Navigating Open Source vs. Commercial Frontiers
Unlocking AI: Navigating Open Source vs. Commercial Frontiers
 
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
Splashtop Enterprise Brochure - Remote Computer Access and Remote Support Sof...
 
20140812 - OBD2 Solution
20140812 - OBD2 Solution20140812 - OBD2 Solution
20140812 - OBD2 Solution
 
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
BusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptxBusinessGPT  - SECURITY AND GOVERNANCE  FOR GENERATIVE AI.pptx
BusinessGPT - SECURITY AND GOVERNANCE FOR GENERATIVE AI.pptx
 
8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf8 key point on optimizing web hosting services in your business.pdf
8 key point on optimizing web hosting services in your business.pdf
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
 

Chicago Kafka Meetup

  • 1. 1Confidential KSQL An Open Source Streaming SQL Engine for Apache Kafka Cliff Gilmore Sr. Systems Engineer cliff@confluent.io
  • 4. 4Confidential Streaming Data vs Big Data Stream Data is The Faster the Better Stream Data can be Big or Fast (Lambda) Stream Data will be Big AND Fast (Kappa) Apache Kafka is the Enabling Technology of this Transition Big Data was The More the Better ValueofData Age of Data Speed Table Batch Table Database Streams Hadoop Job 1 Job 2 Streams Table 1 Table 2 Database ValueofData Volume of Data
  • 8. 8Confidential The Streams API of Apache Kafka™ ✓ No separate processing cluster required ✓ Develop on Mac, Linux, Windows ✓ Deploy to containers, VMs, bare metal, cloud ✓ Powered by Kafka: elastic, scalable, distributed, battle-tested ✓ Perfect for small, medium, large use cases ✓ Fully integrated with Kafka security ✓ Exactly-once processing semantics ✓ Part of Apache Kafka, included in Confluent Open Source Write standard Java applications and microservices to process your data in real-time KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); http://kafka.apache.org/documentation/streams
  • 11. KSQL
  • 13. 13Confidential Consumer , Producer Kafka Streams KSQL Flexibility Simplicity subscribe(), poll(), send(), flush() mapValues(), filter(), punctuate() Select…from … join…where… group by..
  • 14. 14Confidential Why KSQL? • Expand access to Kafka Stream Processing to more people • More accessible • Less intimidating • Lower the barriers to entry Benefits • Enable stream processing with zero coding required • The simplest way to process streams of data in real-time • Powered by Kafka: scalable, distributed, battle-tested
  • 15. 15Confidential On the Shoulders of (Streaming) Giants • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka’s security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Supports late-arriving and out-of-order data • Windowing • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees
  • 16. 16Confidential KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) • Changelog ● STREAM – TABLE Joins
  • 17. 17Confidential Schema & Format ●Start with message (value) format ● JSON - the simplest choice ● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be expanded. ● AVRO - requires that you also supply a schema-file (.avsc), Schema Registry support soon! ●Pseudo-columns are automatically provided • ROWKEY, ROWTIME - for querying the message key and timestamp • (PARTITION, OFFSET coming soon) • CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH (value_format = 'delimited', kafka_topic='my_pageview_topic');
  • 18. 18Confidential Interactive Querying ● Great for iterative development ● LIST (or SHOW) STREAMS / TABLES ● DESCRIBE STREAM / TABLE ● SELECT • Selects rows from a KSQL stream or table. • The result of this statement will be printed out in the console. • To stop the continuous query in the CLI press Ctrl+C.
  • 19. 19Confidential What is it for ? ● Streaming ETL ○ Kafka is popular for data pipelines. ○ KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 21. 21Confidential What is it for ? ● Anomaly Detection ○ Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  • 22. 22Confidential What is it for ? ● Real Time Monitoring ○ Log data monitoring, tracking and alerting ○ Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 23. 23Confidential What is it for ? ● Simple Derivations of Existing Topics ○ One-liner to re-partition and/or re-key a topic for new uses CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id;
  • 24. 24Confidential KSQL Components • CLI • Designed to be familiar to users of MySQL, Postgres, etc • Engine • Actually runs the Kafka Streams topologies • REST Server • HTTP interface allows an Engine to receive instructions from the CLI
  • 25. 25Confidential How to run KSQL - #1 Stand-alone aka ‘local mode’ • Starts a CLI, an Engine, and a REST server all in the same JVM • Ideal for laptop development • Start with default settings: > bin/ksql-cli local • Or with customized settings: > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  • 26. 26Confidential How to run KSQL - #2 Client-Server • Start any number of Server nodes • > bin/ksql-server-start • Start any number of CLIs and specify ‘remote’ server address • >bin/ksql-cli remote http://myserver:8090 • All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  • 27. 27Confidential How to run KSQL - #3 as an Application How do you deploy applications ?
  • 28. 28Confidential How to run KSQL - #3 as an Application • Start any number of Engine instances • Pass a file of KSQL statements to execute > bin/ksql-node foo/bar.sql • Ideal for streaming ETL application deployment • Version control your queries and transformations as code • All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  • 30. 30Confidential Questions? Feedback? Please contact me! Cliff Gilmore Sr. Systems Engineer cliff@confluent.io