SlideShare a Scribd company logo
Principles in
Data Stream Processing
Nick Dearden | Matthias J. Sax
Outline
2
• What do we mean by Stream Processing anyway ?
• Concepts
• Events & State
• Time
• (In)Completeness
• Future directions
Events vs State (Example)
3
Events vs State – Streams vs Tables
4
𝑠𝑡𝑎𝑡𝑒 𝑛𝑜𝑤 = )
!"#
$%&
𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 d𝑡
𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 =
d𝑠𝑡𝑎𝑡𝑒(𝑡)
d𝑡
* M. Kleppman, Designing Data-Intensive Applications
Batch vs Stream
Event vs State
Over
Batch vs Stream
Unifying Batch and Stream Processing?
7
No! Approach this from a different direction.
• Batch vs Stream is not just about “processing latency”, but fundamentally about semantics!
• What they have in common is that they both transform data
Instead of focusing on “batch” vs “incremental” data transformation, stream processing is about “data in motion”
in contrast to “data at rest”
• And the goal is to UNIFY data in motion and data at rest with a unique abstraction
We think of this as integrating state and events
• Streams and Tables in a unified processing model and programming interface / query language
Events and ”the Log”
8
Immutability as core concept
Use a log to store events
• Logs store immutable data
• Logs preserve order
Think stream-table join:, an input event is enriched to produce an output event
• After the output event is produced, it’s immutable;
• Thus, if the table is updated, the enriched event is not updated
Events and ”the Log”
9
Company City
Confluent PA
Oracle RWC
Events and ”the Log”
10
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
Events and ”the Log”
11
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
Events and ”the Log”
12
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
CFLT 10 Laptops PA
Events and ”the Log”
13
CFLT 3 Router
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops PA PA
Events and ”the Log”
14
CFLT 3 Router
CFLT 10 Laptops
Company City
Confluent PA
Oracle RWC
Company City
Confluent MV
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops PA PA
Events and ”the Log”
15
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
Company City
Confluent PA
Oracle RWC
Company City
Confluent MV
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
PA MV
PA
Everything is Time-based
Stream Processing and Time Semantics
17
Time is a first-class citizen
Stream processing engine needs to reason about time (not just a data dimension).
Batch processing ignores the time-dimension
User’s responsibility to reason about time – the processing engine does not understand time semantics.
Event Time
When did something happen (e.g. “click on ad”).
Events and State
Tables need a time dimension, too!
Event Time vs Processing Time
18
Event Time
Every event carries a timestamp embedded in the record. Captures when the event happened.
• Stream processing semantics are defined on event time.
Processing Time
When we process the data (last week, today, tomorrow).
• Must not impact the result.
Tables and Time Semantics
19
Classic DB systems don’t track time nor “versions”
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
Tables and Time Semantics (cont.)
20
Manually track time and versions
Or use newer SQL features like “system versioned temporal tables”.
KEY VALUE ValidFrom ValidTo
A 10 1 17
B 42 3 23
C 23 10 MAX_VALUE
A 20 17 40
B 73 23 MAX_VALUE
A <null> 40 MAX_VALUE
Versioned Tables
21
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
t=1 t=3 t=10 t=17 t=23 t=40
A sequence of table versions over time:
t=5 t=20
Events and State in Time
22
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
Company City
Confluent PA
Oracle RWC
Company City
Confluent MV
Oracle RWC
CFLT 3 Router
CFLT 10 Laptops CFLT 20 Mice
PA MV
PA
t=10 t=15 t=30
Incomplete Information
Deterministic Processing
24
Log Order (aka offset order)
• messages are stored in append order (per partition)
• all consumers read message in the same order
• log can also be re-read multiple times, and the order of messages is always the same
Kafka’s strict ordering guarantees
allow to build consistent and deterministic application!
Log Order and Event Time
25
However
• stream processing semantics are defined on (logical) event time order
• infinite input and out-of-order data (event time) make the concept of completeness fuzzy
Consistency != Completeness
When is the Puzzle completed?
26
Limit “scope” via (time) windows
• Define puzzle “boundaries”
• How many puzzle pieces do we have?
When is which Puzzle (window) completed?
27
Event Time
Grace Period
28
Consistency != Completeness
Only you know when you are complete (or rather, how much out-of-order-ness you are prepared to tolerate).
Configurable grace period.
Eventual Completeness ?
Future Developments
Future of Stream Processing
30
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
Future of Stream Processing
31
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
t=1 t=3 t=10 t=17 t=23 t=40
Future of Stream Processing
32
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
retention = 25
t=1 t=3 t=10 t=17 t=23 t=40
Future of Stream Processing
33
Time-versioned tables:
CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25
SECONDS;
KEY VALUE
A 10
KEY VALUE
A 10
B 42
KEY VALUE
A 10
B 42
C 23
KEY VALUE
A 20
B 42
C 23
KEY VALUE
A 20
B 73
C 23
KEY VALUE
A 20
B 42
C 23
Event Time
retention = 25
t=1 t=3 t=10 t=17 t=23 t=40
Future of Stream Processing (cont.)
34
Easier Reasoning about Completeness
• ksqlDB and Kafka Streams are distributed systems
• Vector clocks
DM32 Ad-hoc group on SQL extensions for streaming data
• Confluent, Oracle, IBM, Microsoft, Google, Alibaba
Learn More, Join In!
35
SIGMOD paper
https://www.confluent.io/resources/white-paper/distributed-
stream-processing-in-kafka
Apache Kafka Improvement Proposals (KIPs)
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improveme
nt+Proposals
Confluent blog
https://confluent.io/blog
Streaming Audio with Tim Berglund
https://developer.confluent.io/podcast/
Thanks! We are hiring!

More Related Content

What's hot

High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
Eric Sammer
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
HostedbyConfluent
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
Gyula Fóra
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
HostedbyConfluent
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into Kafka
Abrar Sheikh
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
Knoldus Inc.
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
Joey Bolduc-Gilbert
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
knowbigdata
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
InfluxData
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
confluent
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 

What's hot (20)

High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Stateful Distributed Stream Processing
Stateful Distributed Stream ProcessingStateful Distributed Stream Processing
Stateful Distributed Stream Processing
 
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, ConfluentTemporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
Temporal-Joins in Kafka Streams and ksqlDB | Matthias Sax, Confluent
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Streaming Data from Cassandra into Kafka
Streaming Data from Cassandra into KafkaStreaming Data from Cassandra into Kafka
Streaming Data from Cassandra into Kafka
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
Frossie Economou & Angelo Fausti [Vera C. Rubin Observatory] | How InfluxDB H...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Introduction to the Processor API
Introduction to the Processor APIIntroduction to the Processor API
Introduction to the Processor API
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 

Similar to Principles in Data Stream Processing | Matthias J Sax, Confluent

Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
C4Media
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
KafkaZone
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Scheduling
SchedulingScheduling
Scheduling
ahmad bassiouny
 
Scheduling
SchedulingScheduling
Scheduling
ahmad bassiouny
 
Scheduling
SchedulingScheduling
Scheduling
ahmad bassiouny
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
DataWorks Summit
 
Scheduling ppt @ doms
Scheduling ppt @ doms Scheduling ppt @ doms
Scheduling ppt @ doms
Babasab Patil
 
Scheduling ppt @ DOMS
Scheduling ppt @ DOMSScheduling ppt @ DOMS
Scheduling ppt @ DOMS
Babasab Patil
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
Aleksandr Kuboskin, CFA
 
scheduling_1.ppt
scheduling_1.pptscheduling_1.ppt
scheduling_1.ppt
VELMURUGANM19
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust
安齊 劉
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
Chris Riccomini
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Building Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsBuilding Applications with Streams and Snapshots
Building Applications with Streams and Snapshots
J On The Beach
 

Similar to Principles in Data Stream Processing | Matthias J Sax, Confluent (20)

Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)Introduction to ksqlDB and stream processing (Vish Srinivasan  - Confluent)
Introduction to ksqlDB and stream processing (Vish Srinivasan - Confluent)
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Scheduling
SchedulingScheduling
Scheduling
 
Scheduling
SchedulingScheduling
Scheduling
 
Scheduling
SchedulingScheduling
Scheduling
 
Foundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theoryFoundations of streaming SQL: stream & table theory
Foundations of streaming SQL: stream & table theory
 
Scheduling ppt @ doms
Scheduling ppt @ doms Scheduling ppt @ doms
Scheduling ppt @ doms
 
Scheduling ppt @ DOMS
Scheduling ppt @ DOMSScheduling ppt @ DOMS
Scheduling ppt @ DOMS
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
scheduling_1.ppt
scheduling_1.pptscheduling_1.ppt
scheduling_1.ppt
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
 
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
Flink Forward Berlin 2018: Paris Carbone - "Stream Loops on Flink: Reinventin...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Build a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by RustBuild a minial DBMS from scratch by Rust
Build a minial DBMS from scratch by Rust
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Apache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedInApache Incubator Samza: Stream Processing at LinkedIn
Apache Incubator Samza: Stream Processing at LinkedIn
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Building Applications with Streams and Snapshots
Building Applications with Streams and SnapshotsBuilding Applications with Streams and Snapshots
Building Applications with Streams and Snapshots
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 

Principles in Data Stream Processing | Matthias J Sax, Confluent

  • 1. Principles in Data Stream Processing Nick Dearden | Matthias J. Sax
  • 2. Outline 2 • What do we mean by Stream Processing anyway ? • Concepts • Events & State • Time • (In)Completeness • Future directions
  • 3. Events vs State (Example) 3
  • 4. Events vs State – Streams vs Tables 4 𝑠𝑡𝑎𝑡𝑒 𝑛𝑜𝑤 = ) !"# $%& 𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 d𝑡 𝑠𝑡𝑟𝑒𝑎𝑚 𝑡 = d𝑠𝑡𝑎𝑡𝑒(𝑡) d𝑡 * M. Kleppman, Designing Data-Intensive Applications
  • 7. Unifying Batch and Stream Processing? 7 No! Approach this from a different direction. • Batch vs Stream is not just about “processing latency”, but fundamentally about semantics! • What they have in common is that they both transform data Instead of focusing on “batch” vs “incremental” data transformation, stream processing is about “data in motion” in contrast to “data at rest” • And the goal is to UNIFY data in motion and data at rest with a unique abstraction We think of this as integrating state and events • Streams and Tables in a unified processing model and programming interface / query language
  • 8. Events and ”the Log” 8 Immutability as core concept Use a log to store events • Logs store immutable data • Logs preserve order Think stream-table join:, an input event is enriched to produce an output event • After the output event is produced, it’s immutable; • Thus, if the table is updated, the enriched event is not updated
  • 9. Events and ”the Log” 9 Company City Confluent PA Oracle RWC
  • 10. Events and ”the Log” 10 CFLT 10 Laptops Company City Confluent PA Oracle RWC
  • 11. Events and ”the Log” 11 CFLT 10 Laptops Company City Confluent PA Oracle RWC
  • 12. Events and ”the Log” 12 CFLT 10 Laptops Company City Confluent PA Oracle RWC CFLT 10 Laptops PA
  • 13. Events and ”the Log” 13 CFLT 3 Router CFLT 10 Laptops Company City Confluent PA Oracle RWC CFLT 3 Router CFLT 10 Laptops PA PA
  • 14. Events and ”the Log” 14 CFLT 3 Router CFLT 10 Laptops Company City Confluent PA Oracle RWC Company City Confluent MV Oracle RWC CFLT 3 Router CFLT 10 Laptops PA PA
  • 15. Events and ”the Log” 15 CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice Company City Confluent PA Oracle RWC Company City Confluent MV Oracle RWC CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice PA MV PA
  • 17. Stream Processing and Time Semantics 17 Time is a first-class citizen Stream processing engine needs to reason about time (not just a data dimension). Batch processing ignores the time-dimension User’s responsibility to reason about time – the processing engine does not understand time semantics. Event Time When did something happen (e.g. “click on ad”). Events and State Tables need a time dimension, too!
  • 18. Event Time vs Processing Time 18 Event Time Every event carries a timestamp embedded in the record. Captures when the event happened. • Stream processing semantics are defined on event time. Processing Time When we process the data (last week, today, tomorrow). • Must not impact the result.
  • 19. Tables and Time Semantics 19 Classic DB systems don’t track time nor “versions” KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23
  • 20. Tables and Time Semantics (cont.) 20 Manually track time and versions Or use newer SQL features like “system versioned temporal tables”. KEY VALUE ValidFrom ValidTo A 10 1 17 B 42 3 23 C 23 10 MAX_VALUE A 20 17 40 B 73 23 MAX_VALUE A <null> 40 MAX_VALUE
  • 21. Versioned Tables 21 KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time t=1 t=3 t=10 t=17 t=23 t=40 A sequence of table versions over time:
  • 22. t=5 t=20 Events and State in Time 22 CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice Company City Confluent PA Oracle RWC Company City Confluent MV Oracle RWC CFLT 3 Router CFLT 10 Laptops CFLT 20 Mice PA MV PA t=10 t=15 t=30
  • 24. Deterministic Processing 24 Log Order (aka offset order) • messages are stored in append order (per partition) • all consumers read message in the same order • log can also be re-read multiple times, and the order of messages is always the same Kafka’s strict ordering guarantees allow to build consistent and deterministic application!
  • 25. Log Order and Event Time 25 However • stream processing semantics are defined on (logical) event time order • infinite input and out-of-order data (event time) make the concept of completeness fuzzy Consistency != Completeness
  • 26. When is the Puzzle completed? 26 Limit “scope” via (time) windows • Define puzzle “boundaries” • How many puzzle pieces do we have?
  • 27. When is which Puzzle (window) completed? 27 Event Time
  • 28. Grace Period 28 Consistency != Completeness Only you know when you are complete (or rather, how much out-of-order-ness you are prepared to tolerate). Configurable grace period. Eventual Completeness ?
  • 30. Future of Stream Processing 30 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS;
  • 31. Future of Stream Processing 31 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS; KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time t=1 t=3 t=10 t=17 t=23 t=40
  • 32. Future of Stream Processing 32 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS; KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time retention = 25 t=1 t=3 t=10 t=17 t=23 t=40
  • 33. Future of Stream Processing 33 Time-versioned tables: CREATE TABLE versionedTable <schema> VERSION RETENTION TIME 25 SECONDS; KEY VALUE A 10 KEY VALUE A 10 B 42 KEY VALUE A 10 B 42 C 23 KEY VALUE A 20 B 42 C 23 KEY VALUE A 20 B 73 C 23 KEY VALUE A 20 B 42 C 23 Event Time retention = 25 t=1 t=3 t=10 t=17 t=23 t=40
  • 34. Future of Stream Processing (cont.) 34 Easier Reasoning about Completeness • ksqlDB and Kafka Streams are distributed systems • Vector clocks DM32 Ad-hoc group on SQL extensions for streaming data • Confluent, Oracle, IBM, Microsoft, Google, Alibaba
  • 35. Learn More, Join In! 35 SIGMOD paper https://www.confluent.io/resources/white-paper/distributed- stream-processing-in-kafka Apache Kafka Improvement Proposals (KIPs) https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improveme nt+Proposals Confluent blog https://confluent.io/blog Streaming Audio with Tim Berglund https://developer.confluent.io/podcast/
  • 36. Thanks! We are hiring!