SlideShare a Scribd company logo
1 of 38
Scalable Stream
Processing with KSQL,
Kafka and ScyllaDB
Hojjat Jafarpour
Software Engineer
hojjat@confluent.io
@Hojjat
Bio
Hojjat Jafarpour
● Software Engineer @ Confluent
○ Creator of KSQL
● Previously at NEC Labs, Informatica, Quantcast and
Tidemark
○ Worked on various big data projects
● Ph.D. in Computer Science from UC Irvine
○ Scalable stream processing and Publish/Subscribe systems
● @Hojjat
Outline
● Stream Processing Use Case
● Introduction to KSQL
● Concepts and Features
● Where to go
Real time customer support
Ecommerce Site Event Stream
Real time customer support
Ecommerce Site Event Stream Hadoop ScyllaDB Customer Support
Real time customer support
Ecommerce Site Event Stream Hadoop ScyllaDB
TOO SLOW!!!
Customer Support
Real time customer support
Ecommerce Site Event Stream Hadoop ScyllaDB Customer Support
Real time customer support
Ecommerce Site Event Stream ScyllaDB Customer Support
Outline
● Stream Processing Use Case
● Introduction to KSQL
● Concepts and Features
● Where to go
KSQL: the Streaming SQL Engine for Apache Kafka
® from Confluent
● Enables stream processing with zero coding required
● The simplest way to process streams of data in real
time
● Powered by Kafka: scalable, distributed, battle-tested
● All you need is Kafka–no complex deployments of
bespoke systems for stream processing
What is it for?
Streaming ETL
● Kafka is popular for data pipelines.
● KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
What is it for?
Anomaly Detection
● Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
What is it for?
Real Time Monitoring
● Log data monitoring, tracking and alerting
● Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
Outline
● Stream Processing Use Case
● Introduction to KSQL
● Concepts and Features
● Where to go
Do you think that’s a table you
are querying ?
Stream/Table Duality
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
TABLE STREAM TABLE
(“alice”, 1)
(“charlie”, 1)
(“alice”, 2)
(“bob”, 1)
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
Streams & Tables
● STREAM and TABLE as first class citizens
○ Interpretation of the topic content
Streams & Tables
● STREAM and TABLE as first class citizens
○ Interpretation of the topic content
● STREAM: data in motion
Streams & Tables
● STREAM and TABLE as first class citizens
○ Interpretation of the topic content
● STREAM: data in motion
● TABLE: collected state of a stream
○ One record per key (per window)
○ Current values (compacted topic)
○ Changelog
Features
● Aggregation
● Window
○ Tumbling
○ Hopping
○ Session
● Join
○ Stream-Stream
○ Stream-Table
○ Table-Table
Features
● Aggregation
● Window
○ Tumbling
○ Hopping
○ Session
● Join
○ Stream-Stream
○ Stream-Table
○ Table-Table
● Nested data
○ STRUCT
● UDF/UDAF
● AVRO, JSON, CSV
○ More to come
● And many more...
Outline
● Stream Processing Use Case
● Introduction to KSQL
● Concepts and Features
● Where to go
Real time customer support
Ecommerce Site Event Stream ScyllaDB Customer Support
Define a STREAM
CREATE STREAM ratings (
rating_id long,
user_id int,
stars int,
route_id int,
rating_time long,
channel varchar,
message varchar
) WITH (
value_format=‘JSON',
kafka_topic=‘ratings');
SELECTing from the Stream
Let’s test our new stream definition by finding all the
low-scoring ratings from our iPhone app
SELECT *
FROM ratings
WHERE stars <= 2
SELECTing from the Stream
Let’s test our new stream definition by finding all the
low-scoring ratings from our iPhone app
SELECT *
FROM ratings
WHERE stars <= 2
AND lcase(channel) LIKE ‘%ios%’
SELECTing from the Stream
Let’s test our new stream definition by finding all the
low-scoring ratings from our iPhone app
SELECT *
FROM ratings
WHERE stars <= 2
AND lcase(channel) LIKE ‘%ios%’
LIMIT 10;
SELECTing from the Stream
And set this to run as a continuous transformation,
with results being saved into a new topic
SELECT *
FROM ratings
WHERE stars <= 2
AND lcase(channel) LIKE ‘%ios%’;
SELECTing from the Stream
And set this to run as a continuous transformation,
with results being saved into a new topic
SELECT *
FROM ratings
WHERE stars <= 2
AND lcase(channel) LIKE ‘%ios%’;
CREATE STREAM poor_ratings AS
Bring in Tables!
http://www.projectluangwa.org
Define reference tables
CREATE TABLE users (
uid int,
name varchar,
elite varchar)
WITH (
value_format=‘JSON’,
kafka_topic=‘mysql-users');
key = ‘uid’,
Joins for Enrichment
Enrich the ‘poor_ratings’ stream with data about each user, and derive a
stream of low quality ratings posted only by our Platinum Elite users
Joins for Enrichment
CREATE STREAM vip_poor_ratings AS
SELECT uid, name, elite,
stars, route_id, rating_time,
message
FROM poor_ratings r
LEFT JOIN users u ON r.user_id = u.uid
WHERE u.elite = 'P';
Enrich the ‘poor_ratings’ stream with data about each user, and derive a
stream of low quality ratings posted only by our Platinum Elite users
Aggregates and Windowing
● COUNT, SUM, MIN, MAX
● Windowing - Not strictly ANSI SQL
● Three window types supported:
○ TUMBLING
○ HOPPING (aka ‘sliding’)
○ SESSION
SELECT uid, name, count(*) as rating_count
FROM vip_poor_ratings
WINDOW TUMBLING(size 5 minutes)
GROUP BY uid, name;
Continuous Aggregates
Save the results of our aggregation to a TABLE
SELECT uid, name, count(*) as rating_count
FROM vip_poor_ratings
WINDOW TUMBLING(size 5 minutes)
GROUP BY uid, name
HAVING count(*) > 2;
Continuous Aggregates
Save the results of our aggregation to a TABLE
SELECT uid, name, count(*) as rating_count
FROM vip_poor_ratings
WINDOW TUMBLING(size 5 minutes)
GROUP BY uid, name
HAVING count(*) > 2;
CREATE TABLE sad_vips AS
Where to go from here?
Time to get involved!
● Download Confluent Platform
● Step through the QuickStart
● Play with the examples and demos
http://confluent.io/ksql
https://github.com/confluentinc/ksql
https://slackpass.io/confluentcommunity #ksql

More Related Content

What's hot

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)Julia Angell
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsScyllaDB
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...ScyllaDB
 
Fast NoSQL from HDDs?
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs? ScyllaDB
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorScyllaDB
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
 
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
Lessons Learned with Cassandra and Spark at the US Patent and Trademark OfficeLessons Learned with Cassandra and Spark at the US Patent and Trademark Office
Lessons Learned with Cassandra and Spark at the US Patent and Trademark OfficeDataStax Academy
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScyllaDB
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIScyllaDB
 
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...ScyllaDB
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your databaseScyllaDB
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandraScyllaDB
 
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScyllaDB
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraDataStax
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesScyllaDB
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteScyllaDB
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 

What's hot (20)

Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
RDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful MigrationsRDBMS to NoSQL: Practical Advice from Successful Migrations
RDBMS to NoSQL: Practical Advice from Successful Migrations
 
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
Scylla Summit 2018: Adventures in AdTech: Processing 50 Billion User Profiles...
 
Fast NoSQL from HDDs?
Fast NoSQL from HDDs? Fast NoSQL from HDDs?
Fast NoSQL from HDDs?
 
Empowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with AlternatorEmpowering the AWS DynamoDB™ application developer with Alternator
Empowering the AWS DynamoDB™ application developer with Alternator
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
Lessons Learned with Cassandra and Spark at the US Patent and Trademark OfficeLessons Learned with Cassandra and Spark at the US Patent and Trademark Office
Lessons Learned with Cassandra and Spark at the US Patent and Trademark Office
 
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS InsightScylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
Scylla Summit 2018: From SAP to Scylla - Tracking the Fleet at GPS Insight
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
 
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
WEBINAR - Introducing Scylla Open Source 3.0: Materialized Views, Secondary I...
 
Workshop - How to benchmark your database
Workshop - How to benchmark your databaseWorkshop - How to benchmark your database
Workshop - How to benchmark your database
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
How ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B FilesHow ReversingLabs Serves File Reputation Service for 10B Files
How ReversingLabs Serves File Reputation Service for 10B Files
 
Seastar Summit 2019 Keynote
Seastar Summit 2019 KeynoteSeastar Summit 2019 Keynote
Seastar Summit 2019 Keynote
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 

Similar to Stream Processing with KSQL, Kafka and ScyllaDB

ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
Scala Frameworks for Web Application 2016
Scala Frameworks for Web Application 2016Scala Frameworks for Web Application 2016
Scala Frameworks for Web Application 2016takezoe
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Alternatives of JPA/Hibernate
Alternatives of JPA/HibernateAlternatives of JPA/Hibernate
Alternatives of JPA/HibernateSunghyouk Bae
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLFlorent Ramiere
 
Paris jug ksql - 2018-06-28
Paris jug ksql - 2018-06-28Paris jug ksql - 2018-06-28
Paris jug ksql - 2018-06-28Florent Ramiere
 
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScyllaDB
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesDriven Inc.
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patternsconfluent
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Exploring KSQL Patterns
Exploring KSQL Patterns Exploring KSQL Patterns
Exploring KSQL Patterns confluent
 
Streaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETLStreaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETLconfluent
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017Monal Daxini
 

Similar to Stream Processing with KSQL, Kafka and ScyllaDB (20)

KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Scala Frameworks for Web Application 2016
Scala Frameworks for Web Application 2016Scala Frameworks for Web Application 2016
Scala Frameworks for Web Application 2016
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Alternatives of JPA/Hibernate
Alternatives of JPA/HibernateAlternatives of JPA/Hibernate
Alternatives of JPA/Hibernate
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Paris jug ksql - 2018-06-28
Paris jug ksql - 2018-06-28Paris jug ksql - 2018-06-28
Paris jug ksql - 2018-06-28
 
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQLScylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
Scylla Summit 2017: Streaming ETL in Kafka for Everyone with KSQL
 
Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Exploring KSQL Patterns
Exploring KSQL PatternsExploring KSQL Patterns
Exploring KSQL Patterns
 
Exploring Relay land
Exploring Relay landExploring Relay land
Exploring Relay land
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Exploring KSQL Patterns
Exploring KSQL Patterns Exploring KSQL Patterns
Exploring KSQL Patterns
 
Wider than rails
Wider than railsWider than rails
Wider than rails
 
Streaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETLStreaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETL
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Stream Processing with KSQL, Kafka and ScyllaDB

  • 1. Scalable Stream Processing with KSQL, Kafka and ScyllaDB Hojjat Jafarpour Software Engineer hojjat@confluent.io @Hojjat
  • 2. Bio Hojjat Jafarpour ● Software Engineer @ Confluent ○ Creator of KSQL ● Previously at NEC Labs, Informatica, Quantcast and Tidemark ○ Worked on various big data projects ● Ph.D. in Computer Science from UC Irvine ○ Scalable stream processing and Publish/Subscribe systems ● @Hojjat
  • 3. Outline ● Stream Processing Use Case ● Introduction to KSQL ● Concepts and Features ● Where to go
  • 4. Real time customer support Ecommerce Site Event Stream
  • 5. Real time customer support Ecommerce Site Event Stream Hadoop ScyllaDB Customer Support
  • 6. Real time customer support Ecommerce Site Event Stream Hadoop ScyllaDB TOO SLOW!!! Customer Support
  • 7. Real time customer support Ecommerce Site Event Stream Hadoop ScyllaDB Customer Support
  • 8. Real time customer support Ecommerce Site Event Stream ScyllaDB Customer Support
  • 9. Outline ● Stream Processing Use Case ● Introduction to KSQL ● Concepts and Features ● Where to go
  • 10. KSQL: the Streaming SQL Engine for Apache Kafka ® from Confluent ● Enables stream processing with zero coding required ● The simplest way to process streams of data in real time ● Powered by Kafka: scalable, distributed, battle-tested ● All you need is Kafka–no complex deployments of bespoke systems for stream processing
  • 11. What is it for? Streaming ETL ● Kafka is popular for data pipelines. ● KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 12. What is it for? Anomaly Detection ● Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3;
  • 13. What is it for? Real Time Monitoring ● Log data monitoring, tracking and alerting ● Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 14. Outline ● Stream Processing Use Case ● Introduction to KSQL ● Concepts and Features ● Where to go
  • 15. Do you think that’s a table you are querying ?
  • 17. alice 1 alice 1 charlie 1 alice 2 charlie 1 alice 2 charlie 1 bob 1 TABLE STREAM TABLE (“alice”, 1) (“charlie”, 1) (“alice”, 2) (“bob”, 1) alice 1 alice 1 charlie 1 alice 2 charlie 1 alice 2 charlie 1 bob 1
  • 18. Streams & Tables ● STREAM and TABLE as first class citizens ○ Interpretation of the topic content
  • 19. Streams & Tables ● STREAM and TABLE as first class citizens ○ Interpretation of the topic content ● STREAM: data in motion
  • 20. Streams & Tables ● STREAM and TABLE as first class citizens ○ Interpretation of the topic content ● STREAM: data in motion ● TABLE: collected state of a stream ○ One record per key (per window) ○ Current values (compacted topic) ○ Changelog
  • 21. Features ● Aggregation ● Window ○ Tumbling ○ Hopping ○ Session ● Join ○ Stream-Stream ○ Stream-Table ○ Table-Table
  • 22. Features ● Aggregation ● Window ○ Tumbling ○ Hopping ○ Session ● Join ○ Stream-Stream ○ Stream-Table ○ Table-Table ● Nested data ○ STRUCT ● UDF/UDAF ● AVRO, JSON, CSV ○ More to come ● And many more...
  • 23. Outline ● Stream Processing Use Case ● Introduction to KSQL ● Concepts and Features ● Where to go
  • 24. Real time customer support Ecommerce Site Event Stream ScyllaDB Customer Support
  • 25. Define a STREAM CREATE STREAM ratings ( rating_id long, user_id int, stars int, route_id int, rating_time long, channel varchar, message varchar ) WITH ( value_format=‘JSON', kafka_topic=‘ratings');
  • 26. SELECTing from the Stream Let’s test our new stream definition by finding all the low-scoring ratings from our iPhone app SELECT * FROM ratings WHERE stars <= 2
  • 27. SELECTing from the Stream Let’s test our new stream definition by finding all the low-scoring ratings from our iPhone app SELECT * FROM ratings WHERE stars <= 2 AND lcase(channel) LIKE ‘%ios%’
  • 28. SELECTing from the Stream Let’s test our new stream definition by finding all the low-scoring ratings from our iPhone app SELECT * FROM ratings WHERE stars <= 2 AND lcase(channel) LIKE ‘%ios%’ LIMIT 10;
  • 29. SELECTing from the Stream And set this to run as a continuous transformation, with results being saved into a new topic SELECT * FROM ratings WHERE stars <= 2 AND lcase(channel) LIKE ‘%ios%’;
  • 30. SELECTing from the Stream And set this to run as a continuous transformation, with results being saved into a new topic SELECT * FROM ratings WHERE stars <= 2 AND lcase(channel) LIKE ‘%ios%’; CREATE STREAM poor_ratings AS
  • 32. Define reference tables CREATE TABLE users ( uid int, name varchar, elite varchar) WITH ( value_format=‘JSON’, kafka_topic=‘mysql-users'); key = ‘uid’,
  • 33. Joins for Enrichment Enrich the ‘poor_ratings’ stream with data about each user, and derive a stream of low quality ratings posted only by our Platinum Elite users
  • 34. Joins for Enrichment CREATE STREAM vip_poor_ratings AS SELECT uid, name, elite, stars, route_id, rating_time, message FROM poor_ratings r LEFT JOIN users u ON r.user_id = u.uid WHERE u.elite = 'P'; Enrich the ‘poor_ratings’ stream with data about each user, and derive a stream of low quality ratings posted only by our Platinum Elite users
  • 35. Aggregates and Windowing ● COUNT, SUM, MIN, MAX ● Windowing - Not strictly ANSI SQL ● Three window types supported: ○ TUMBLING ○ HOPPING (aka ‘sliding’) ○ SESSION SELECT uid, name, count(*) as rating_count FROM vip_poor_ratings WINDOW TUMBLING(size 5 minutes) GROUP BY uid, name;
  • 36. Continuous Aggregates Save the results of our aggregation to a TABLE SELECT uid, name, count(*) as rating_count FROM vip_poor_ratings WINDOW TUMBLING(size 5 minutes) GROUP BY uid, name HAVING count(*) > 2;
  • 37. Continuous Aggregates Save the results of our aggregation to a TABLE SELECT uid, name, count(*) as rating_count FROM vip_poor_ratings WINDOW TUMBLING(size 5 minutes) GROUP BY uid, name HAVING count(*) > 2; CREATE TABLE sad_vips AS
  • 38. Where to go from here? Time to get involved! ● Download Confluent Platform ● Step through the QuickStart ● Play with the examples and demos http://confluent.io/ksql https://github.com/confluentinc/ksql https://slackpass.io/confluentcommunity #ksql

Editor's Notes

  1. A quick intro to KSQL in case they missed Niel’s talk.
  2. STREAM and TABLE are both first-class citizens in KSQL Both of these are interpretations of topic content. Topics are what *are*. Streams and tables are KSQL abstractions.
  3. STREAM - data in motion. An unbounded sequence of facts (aka events, messages). TABLE - collected state of a stream. An evolving collection of facts. One record per key (per window) Changelog