SlideShare a Scribd company logo
1 of 66
A Brief History of
Stream Processing
TimeSeries Meetup, Tallin, Estonia, Europe, Earth,
Milky Way, Universe …. 42
Who I Am
Riccardo Tommasini

Research Fellow @ UT

Future AssistantProfessor

Enthusiast!
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
The Origin
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
The Vision
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
Stream Processing + AI
(Deductive)
The Title
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
Stream Processing + AI
(Inductive)
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
Big Stream Processing
Starts
Precision not Recall
Timeline
20152002 2005 2008 201820102006 2007 2014
S. Murthy et al. Pulsar – Real-Time
Analytics at
Scale. Technical report, eBay,
2015.
Summingbird: A Framework for
Integrating Batch and Online
MapReduce Computations. 

Trill: A High-Performance
Incremental Query Processor for
Diverse Analytics. 

TelegraphCQ:
Continuous Dataflow
Processing. 

NiagaraCQ: A
Scalable
Continuous
Query System
for Internet
Databases. 

Aurora: A New
Model and
Architecture for Data
Stream
Management
Our Focus
https://www.google.com/url?
sa=i&source=images&cd=&ved=2ahUKE
wjaouS-
uK3mAhVEqaQKHfdOA4MQjRx6BAgBE
AQ&url=https%3A%2F%2Fmedium.com
%2Fpersonal-growth-lab%2Fshould-
you-set-realistic-or-highly-ambitious-
goals-7cf400505444&psig=AOvVaw3xPk
Wzwvy8SBXD9NS0o_Oe&ust=15761481
63494865
Is this too
ambitious?
A (Query) Language Perspective
It addresses the problem of manipulating and managing
data-streams. It stresses on the operations that are
necessary to build applications. Some Assumptions
are made on the underlying system(s).
It address the problem of dealing with unbounded
data. Discuss the systems’ primitives that are
necessary to guarantee low-latency and fault-
tolerance in presence of *uncertainty*, e.g., late
arrivals.
A System Perspective
A new hybrid hope perspective
It addresses the problems in between. How do we map
the abstraction of an high-level language to the
underlying system?
Goals
• Exploit lesson learnt
in RDBMS theory.

• simple, clear, yet
powerful language.

• efficiently combine
streams and
relations.
• low-latency is
paramount.

• take data distribution
and ordering into
account.

• avoid implicit operator
semantics.
CQL DSM DFM
• Correctness first with
sessions support.

• Latency requirements
vs resource cost
should dictate system
choice.

• Single processing
model for bounded
and unbounded data.
Time
• DSMS Clock Time, i.e.,
the timestamp
assigned by the
DSMS.

• Source Time, i.e., the
timestamp assigned by
the source.

• Source Heartbeat are
used to declare no
element with lower
timestamp will be sent.
• Logical Order induced
by the timestamp
assigned by the data
source.

• Physical Order
induced by the
timestamp assigned
by the system at
ingestion.
CQL DSM DFM
• Event Time is the time
at which the event itself
actually occurred. 

• Processing Time is the
time at which an event
is observed during
processing.
• Watermark, i.e., global
progress metrics that
tracks the skewness
between ET and PT.
Watermark
Abstractions
• Streams 

• (Time-Varying)
Relations*
• Streams 

• Change-log
Streams

• Records Streams

• Tables
CQL DSM DFM
• PCollections
(bounded and
unbounded)
CQL in 3 Slides
A Stream S is a possibly infinite multi-set of elements <s,t>
where s is a tuple belonging to the schema of S and t is a
timestamp.
Relation R is a set of tuples (d1, d2, ..., dn), where each
element dj is a member of Dj, a data domain1. 
1 a Data Domain refers to all the values which a data element may contain.
CQL in 3 4 Slides
A Stream S is a possibly infinite multi-set of elements <s,t>
where s is a tuple belonging to the schema of S and t is a
timestamp.
Relation R is a set of tuples (d1, d2, ..., dn), where each
element dj is a member of Dj, a data domain1. 
1 a Data Domain refers to all the values which a data element may contain.
X
Ok, CQL in 4 5 Slides
A Stream S is a possibly infinite multi-set of elements <s,t>
where s is a tuple belonging to the schema of S and t is a
timestamp.
Relation R is a mapping from each time instant in T to a
finite but unbounded bag of tuples belonging to the
schema of R.
1 a Data Domain refers to all the values which a data element may contain.
X
CQL in 5 Slides
A Stream S is a possibly infinite multi-set of elements <s,t>
where s is a tuple belonging to the schema of S and t is a
timestamp.
Relation R is a mapping from each time instant in T to a
finite but unbounded bag of tuples belonging to the
schema of R.
1 a Data Domain refers to all the values which a data element may contain.
CQL in 5 Slides
Streams Relations
…
<s,τ>
…
<s1>
<s2>
<s3>
infinite
unbounded
sequence finite
bag
Mapping: T ! R
stream-to-relation
relation-to-stream
relation-to-relation
Stream
Relation R(t)
Relational Algebra (Almost)
*Stream operators
Sliding windows
CQL in 5 6 Slides
Stream-to-Relation Operators:

• Sliding Window: 

FROM S [ RANGE 5 Minutes]

• Parametric Sliding Windows: 

FROM S [ RANGE 5 Minutes Slide 1 Min]
• Partitioned Windows: 

FROM S [PARTITIONED BY A1..An ROW m]
1 a Data Domain refers to all the values which a data element may contain.
X
CQL in 6 Slides
R2R operator
s3
s4 s5
s6
s7
s8
s9 s10
s11
s12S
s1
s2
W(ω,β)
β
ω
t
widthslide
CQL in 6 Slides
Relation-to-Stream Operators:

• Rstream: streams out all data in the last step

• Istream: streams out data in the last step that wasn’t
on the previous step, i.e. streams out what is new

• Dstream: streams out data in the previous step that
isn’t in the last step, i.e. streams out what is old
1 a Data Domain refers to all the values which a data element may contain.
• What results are being computed

• Where in event time are being computed

• When in processing time are being materialised

• How “earlier results” relate to “later refinements”
Reasoning about time
DFM in X<42* Slides
*overestimation
A Stream is represented as a possibly unbounded collection
of key-value pairs, i.e., a PCollection<K,V>.

PTransforms are PCollections-to-PCollections operations.
DFM in X Slides
(WHAT) Processing Model
DFM in X Slides
(WHAT) Processing Model
DFM in X Slides
(WHERE) Windowing Model
DFM in X Slides
(WHERE) Windowing Model
DFM in X Slides
• The DFM windowing model requires extended primitives than CQL’s one.

• Window Assignment, i.e., each element is assigned to a corresponding window.

• Window Merge:

• Drop Timestamp: only the window interval is relevant here on
• GroupBy Key: to enable parallel execution

• Window Merge (the merge logic depends on the window strategy)

• GroupByWindow: to ensure elements are processed in sequence window-
wise.

• ExpandToElements: assigned a valid timestamp to the elements.
(WHERE) Windowing Model
Wall of Text
Disclaimer
DFM in X Slides
• In order to build unaligned (apply across subset of the
data) event-time windows DFM is forced to decouple the
reporting of the window content.

• To do so, DFM introduces an orthogonal model that
allows to signal when a window is ready to be processed.
(WHEN) Triggering Model
DFM in X Slides
• In order to build unaligned (apply across subset of the
data) event-time windows DFM is forced to decouple the
reporting of the window content.

• To do so, DFM introduces an orthogonal model that
allows to signal when a window is ready to be processed.

• Triggers solve this issue allowing DFM users to write an
arbitrary logic to signal the completion of a window.
(WHEN) Triggering Model
DFM in X Slides
In addition to the triggering semantics, DFM introduces different
refinements models to deal with late arrivals

• Discarding, i.e., upon triggering, window contents are discarded
and later results bear no relation to previous results.

• Accumulating, i.e., upon triggering, window contents are left intact
in a persistent state and later results will become a refinement to
previous results

• Accumulating & Retracting, i.e, it extends the accumulating
semantics with retraction of the previous value.
(HOW) Triggering Model (part 2)
Wall of Text
Disclaimer
– Do the math
“X ~ 9 (7 + 2 WoT)”
https://www.google.com/url?
sa=i&source=images&cd=&ved=2ahUKE
wjaouS-
uK3mAhVEqaQKHfdOA4MQjRx6BAgBE
AQ&url=https%3A%2F%2Fmedium.com
%2Fpersonal-growth-lab%2Fshould-
you-set-realistic-or-highly-ambitious-
goals-7cf400505444&psig=AOvVaw3xPk
Wzwvy8SBXD9NS0o_Oe&ust=15761481
63494865
Was this too
ambitious?
ZZZZ
Dual Stream Model
The design space
Should I Take a Step back?
curtesy of Emanuele Della Valle - http://emanueledellavalle.org
A Conceptual View of Kafka
• Producers send messages on
topics

• Consumers read messages
from topics

• Messages are key-value pairs

• Topics are streams of
messages

• Kafka cluster manages topics

A Logical View of Kafka
• Brokers are the main
storage and messaging
components of the Kafka
cluster
curtesy of Emanuele Della Valle - http://emanueledellavalle.org
Reconciling the two views
of Kafka
• Topics are partitioned across
brokers

• Producers shard messages
over the partitions of a certain
topic

• Typically, the message key
determines which Partition a
message is assigned to
curtesy of Emanuele Della Valle - http://emanueledellavalle.org
Topic partitioning invites
distributed consumption
• Different Consumers can read data
from the same Topic

• By default, each
Consumer will receive all
the messages in the Topic

• Multiple Consumers can be
combined into a Consumer Group

• Consumer Groups provide
scaling capabilities

• Each Consumer is
assigned a subset of
Partitions for consumption
curtesy of Emanuele Della Valle - http://emanueledellavalle.org
Dual Stream Model
The intuition
Dual Stream Model
The intuition
https://www.michael-noll.com/blog/2018/04/05/of-stream-
and-tables-in-kafka-and-stream-processing-part1/
Dual Stream Model
The intuition
https://www.michael-noll.com/blog/2018/04/05/of-stream-
and-tables-in-kafka-and-stream-processing-part1/
Dual Stream Model
The Truth about Streams
Stream
Change-log Stream Record Stream
Unbounded and
ordered sequence
of key-value pairs
A streams whose
records are
updates to a table
A streams whose
records are facts
records are
identified by a
primary key
records are not
identified by a
primary key
Dual Stream Model
A table is a collection of table versions; one version for each
point in time using the timestamp as a version number.
The Truth about Tables
T(1)={T5 ={⟨A,7.2⟩}}

T(2) = {T5 = {⟨A,7.2⟩},T6 = {⟨B,14.7⟩}}

T(3) = {T5 = {⟨A,7.2⟩}, T6 = {⟨A,8.9⟩,⟨B,14.7⟩}}
T(4) = {T3 = {⟨B,12.1⟩},T5 = {⟨A,7.2⟩,⟨B,12.1⟩}, T6 = {⟨A, 8.9⟩, ⟨B, 14.7⟩}}

T(5) = {T3 = {⟨B,12.1⟩},T5 = {⟨A,7.2⟩,⟨B,12.1⟩}, T6 = {⟨A, 8.9⟩, ⟨B,
14.7⟩},T8 = {⟨A, 8.9⟩, ⟨B, 16.7⟩}}
Dual Stream Model
The Truth about Operations
Stateless Operations
Dual Stream Model
The Truth about Operations
Stateful Operations
Dual Stream Model
The Complete Picture
https://www.michael-noll.com/blog/2018/04/05/of-stream-
and-tables-in-kafka-and-stream-processing-part1/
Dual Stream Model
The DSM simplifies the reasoning about the
transformations, but does not solve the unboundedness
problem.

We still need infinite memory for processing an infinite
stream.

DSM introduces the retention time to make the trade-off
explicit.
Result Correctness vs Runtime Cost
Mapping to Kafka
Minimal
https://www.michael-noll.com/blog/2018/04/05/of-stream-
and-tables-in-kafka-and-stream-processing-part1/
Mapping to Kafka
Minimal
Concept Partitioned Unbounded Ordering Mutable Unique key
constraint
Schema
Topic Yes Yes Yes No No No (raw
bytes)
Stream Yes Yes Yes No No Yes
Table Yes Yes No Yes Yes Yes
Concept Kafka
Streams
KSQL Java Scala Python
Topic - - List/Stream List/Stream[(Array[Byte],
Array[Byte])]
[]
Stream KStream STREAM List/Stream List/Stream[(K, V)] []
Table KTable TABLE HashMap mutable.Map[K, V] {}
https://www.michael-noll.com/blog/2018/04/05/of-stream-
and-tables-in-kafka-and-stream-processing-part1/
CQL
Esper
Kafka

Streams
Stream

Duality
DataFlow
KSQL
Flink
Models
& Issues
SECRET 

Model
Complex Event
Processing
Stream Processing
+
AI
Meetup, Tallin
First Event April/March
By Confluent, 

so free beers
Come and talk about
your Kafka experience!
Thanks!
Questions?

rictomm.me

@rictomm

this@rictomm.me

I’m Hiring PhD
Students!!

More Related Content

What's hot

Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain confluent
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Design Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDesign Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDenodo
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry confluent
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDatabricks
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Réussir une montée en charge avec MongoDB
Réussir une montée en charge avec MongoDBRéussir une montée en charge avec MongoDB
Réussir une montée en charge avec MongoDB MongoDB
 
Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019
Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019
Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019confluent
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...HostedbyConfluent
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseGregory Keys
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Cathrine Wilhelmsen
 
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...Databricks
 

What's hot (20)

Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Design Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDesign Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data Organizations
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Delta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the HoodDelta Lake Streaming: Under the Hood
Delta Lake Streaming: Under the Hood
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Réussir une montée en charge avec MongoDB
Réussir une montée en charge avec MongoDBRéussir une montée en charge avec MongoDB
Réussir une montée en charge avec MongoDB
 
Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019
Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019
Kafka Streams at Scale (Deepak Goyal, Walmart Labs) Kafka Summit London 2019
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
Building Dynamic Data Pipelines in Azure Data Factory (Microsoft Ignite 2019)
 
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
Using AI to Build a Self-Driving Query Optimizer with Shivnath Babu and Adria...
 

Similar to A Brief History of Stream Processing

Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkDongwon Kim
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Flink Forward
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.pptSagarDR5
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingMatthew Dennis
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
#NoEstimates project planning using Monte Carlo simulation
#NoEstimates project planning using Monte Carlo simulation#NoEstimates project planning using Monte Carlo simulation
#NoEstimates project planning using Monte Carlo simulationDimitar Bakardzhiev
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015wolf4ood
 

Similar to A Brief History of Stream Processing (20)

Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Dataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDataflow - A Unified Model for Batch and Streaming Data Processing
Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Cs 331 Data Structures
Cs 331 Data StructuresCs 331 Data Structures
Cs 331 Data Structures
 
cb streams - gavin pickin
cb streams - gavin pickincb streams - gavin pickin
cb streams - gavin pickin
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
#NoEstimates project planning using Monte Carlo simulation
#NoEstimates project planning using Monte Carlo simulation#NoEstimates project planning using Monte Carlo simulation
#NoEstimates project planning using Monte Carlo simulation
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015Time Series With OrientDB - Fosdem 2015
Time Series With OrientDB - Fosdem 2015
 

Recently uploaded

George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 

Recently uploaded (20)

George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 

A Brief History of Stream Processing

  • 1. A Brief History of Stream Processing TimeSeries Meetup, Tallin, Estonia, Europe, Earth, Milky Way, Universe …. 42
  • 2. Who I Am Riccardo Tommasini Research Fellow @ UT Future AssistantProfessor Enthusiast!
  • 3. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 5. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 7. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 8. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 9.
  • 10. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 11. Precision not Recall Timeline 20152002 2005 2008 201820102006 2007 2014 Stream Processing + AI (Deductive)
  • 13. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 14. Timeline 20152002 2005 2008 201820102006 2007 2014 Stream Processing + AI (Inductive) Precision not Recall
  • 15. Timeline 20152002 2005 2008 201820102006 2007 2014 Precision not Recall
  • 16.
  • 17. Timeline 20152002 2005 2008 201820102006 2007 2014 Big Stream Processing Starts Precision not Recall
  • 18. Timeline 20152002 2005 2008 201820102006 2007 2014 S. Murthy et al. Pulsar – Real-Time Analytics at Scale. Technical report, eBay, 2015.
Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. 
 Trill: A High-Performance Incremental Query Processor for Diverse Analytics. 
 TelegraphCQ: Continuous Dataflow Processing. 
 NiagaraCQ: A Scalable Continuous Query System for Internet Databases. 
 Aurora: A New Model and Architecture for Data Stream Management
  • 21. A (Query) Language Perspective It addresses the problem of manipulating and managing data-streams. It stresses on the operations that are necessary to build applications. Some Assumptions are made on the underlying system(s). It address the problem of dealing with unbounded data. Discuss the systems’ primitives that are necessary to guarantee low-latency and fault- tolerance in presence of *uncertainty*, e.g., late arrivals. A System Perspective A new hybrid hope perspective It addresses the problems in between. How do we map the abstraction of an high-level language to the underlying system?
  • 22. Goals • Exploit lesson learnt in RDBMS theory. • simple, clear, yet powerful language. • efficiently combine streams and relations. • low-latency is paramount. • take data distribution and ordering into account. • avoid implicit operator semantics. CQL DSM DFM • Correctness first with sessions support. • Latency requirements vs resource cost should dictate system choice. • Single processing model for bounded and unbounded data.
  • 23. Time • DSMS Clock Time, i.e., the timestamp assigned by the DSMS. • Source Time, i.e., the timestamp assigned by the source. • Source Heartbeat are used to declare no element with lower timestamp will be sent. • Logical Order induced by the timestamp assigned by the data source. • Physical Order induced by the timestamp assigned by the system at ingestion. CQL DSM DFM • Event Time is the time at which the event itself actually occurred. • Processing Time is the time at which an event is observed during processing. • Watermark, i.e., global progress metrics that tracks the skewness between ET and PT.
  • 25. Abstractions • Streams • (Time-Varying) Relations* • Streams • Change-log Streams • Records Streams • Tables CQL DSM DFM • PCollections (bounded and unbounded)
  • 26. CQL in 3 Slides A Stream S is a possibly infinite multi-set of elements <s,t> where s is a tuple belonging to the schema of S and t is a timestamp. Relation R is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain1.  1 a Data Domain refers to all the values which a data element may contain.
  • 27. CQL in 3 4 Slides A Stream S is a possibly infinite multi-set of elements <s,t> where s is a tuple belonging to the schema of S and t is a timestamp. Relation R is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain1.  1 a Data Domain refers to all the values which a data element may contain. X
  • 28. Ok, CQL in 4 5 Slides A Stream S is a possibly infinite multi-set of elements <s,t> where s is a tuple belonging to the schema of S and t is a timestamp. Relation R is a mapping from each time instant in T to a finite but unbounded bag of tuples belonging to the schema of R. 1 a Data Domain refers to all the values which a data element may contain. X
  • 29. CQL in 5 Slides A Stream S is a possibly infinite multi-set of elements <s,t> where s is a tuple belonging to the schema of S and t is a timestamp. Relation R is a mapping from each time instant in T to a finite but unbounded bag of tuples belonging to the schema of R. 1 a Data Domain refers to all the values which a data element may contain.
  • 30. CQL in 5 Slides Streams Relations … <s,τ> … <s1> <s2> <s3> infinite unbounded sequence finite bag Mapping: T ! R stream-to-relation relation-to-stream relation-to-relation Stream Relation R(t) Relational Algebra (Almost) *Stream operators Sliding windows
  • 31. CQL in 5 6 Slides Stream-to-Relation Operators: • Sliding Window: 
 FROM S [ RANGE 5 Minutes] • Parametric Sliding Windows: 
 FROM S [ RANGE 5 Minutes Slide 1 Min] • Partitioned Windows: 
 FROM S [PARTITIONED BY A1..An ROW m] 1 a Data Domain refers to all the values which a data element may contain. X
  • 32. CQL in 6 Slides R2R operator s3 s4 s5 s6 s7 s8 s9 s10 s11 s12S s1 s2 W(ω,β) β ω t widthslide
  • 33. CQL in 6 Slides Relation-to-Stream Operators: • Rstream: streams out all data in the last step • Istream: streams out data in the last step that wasn’t on the previous step, i.e. streams out what is new • Dstream: streams out data in the previous step that isn’t in the last step, i.e. streams out what is old 1 a Data Domain refers to all the values which a data element may contain.
  • 34. • What results are being computed • Where in event time are being computed • When in processing time are being materialised • How “earlier results” relate to “later refinements” Reasoning about time DFM in X<42* Slides *overestimation
  • 35. A Stream is represented as a possibly unbounded collection of key-value pairs, i.e., a PCollection<K,V>. PTransforms are PCollections-to-PCollections operations. DFM in X Slides (WHAT) Processing Model
  • 36. DFM in X Slides (WHAT) Processing Model
  • 37. DFM in X Slides (WHERE) Windowing Model
  • 38. DFM in X Slides (WHERE) Windowing Model
  • 39. DFM in X Slides • The DFM windowing model requires extended primitives than CQL’s one. • Window Assignment, i.e., each element is assigned to a corresponding window. • Window Merge: • Drop Timestamp: only the window interval is relevant here on • GroupBy Key: to enable parallel execution • Window Merge (the merge logic depends on the window strategy) • GroupByWindow: to ensure elements are processed in sequence window- wise. • ExpandToElements: assigned a valid timestamp to the elements. (WHERE) Windowing Model Wall of Text Disclaimer
  • 40. DFM in X Slides • In order to build unaligned (apply across subset of the data) event-time windows DFM is forced to decouple the reporting of the window content. • To do so, DFM introduces an orthogonal model that allows to signal when a window is ready to be processed. (WHEN) Triggering Model
  • 41. DFM in X Slides • In order to build unaligned (apply across subset of the data) event-time windows DFM is forced to decouple the reporting of the window content. • To do so, DFM introduces an orthogonal model that allows to signal when a window is ready to be processed. • Triggers solve this issue allowing DFM users to write an arbitrary logic to signal the completion of a window. (WHEN) Triggering Model
  • 42. DFM in X Slides In addition to the triggering semantics, DFM introduces different refinements models to deal with late arrivals • Discarding, i.e., upon triggering, window contents are discarded and later results bear no relation to previous results. • Accumulating, i.e., upon triggering, window contents are left intact in a persistent state and later results will become a refinement to previous results • Accumulating & Retracting, i.e, it extends the accumulating semantics with retraction of the previous value. (HOW) Triggering Model (part 2) Wall of Text Disclaimer
  • 43. – Do the math “X ~ 9 (7 + 2 WoT)”
  • 45. ZZZZ
  • 46.
  • 47. Dual Stream Model The design space
  • 48. Should I Take a Step back?
  • 49. curtesy of Emanuele Della Valle - http://emanueledellavalle.org A Conceptual View of Kafka • Producers send messages on topics • Consumers read messages from topics • Messages are key-value pairs • Topics are streams of messages • Kafka cluster manages topics

  • 50. A Logical View of Kafka • Brokers are the main storage and messaging components of the Kafka cluster curtesy of Emanuele Della Valle - http://emanueledellavalle.org
  • 51. Reconciling the two views of Kafka • Topics are partitioned across brokers • Producers shard messages over the partitions of a certain topic • Typically, the message key determines which Partition a message is assigned to curtesy of Emanuele Della Valle - http://emanueledellavalle.org
  • 52. Topic partitioning invites distributed consumption • Different Consumers can read data from the same Topic • By default, each Consumer will receive all the messages in the Topic • Multiple Consumers can be combined into a Consumer Group • Consumer Groups provide scaling capabilities • Each Consumer is assigned a subset of Partitions for consumption curtesy of Emanuele Della Valle - http://emanueledellavalle.org
  • 54. Dual Stream Model The intuition https://www.michael-noll.com/blog/2018/04/05/of-stream- and-tables-in-kafka-and-stream-processing-part1/
  • 55. Dual Stream Model The intuition https://www.michael-noll.com/blog/2018/04/05/of-stream- and-tables-in-kafka-and-stream-processing-part1/
  • 56. Dual Stream Model The Truth about Streams Stream Change-log Stream Record Stream Unbounded and ordered sequence of key-value pairs A streams whose records are updates to a table A streams whose records are facts records are identified by a primary key records are not identified by a primary key
  • 57. Dual Stream Model A table is a collection of table versions; one version for each point in time using the timestamp as a version number. The Truth about Tables T(1)={T5 ={⟨A,7.2⟩}}
 T(2) = {T5 = {⟨A,7.2⟩},T6 = {⟨B,14.7⟩}}
 T(3) = {T5 = {⟨A,7.2⟩}, T6 = {⟨A,8.9⟩,⟨B,14.7⟩}} T(4) = {T3 = {⟨B,12.1⟩},T5 = {⟨A,7.2⟩,⟨B,12.1⟩}, T6 = {⟨A, 8.9⟩, ⟨B, 14.7⟩}}
 T(5) = {T3 = {⟨B,12.1⟩},T5 = {⟨A,7.2⟩,⟨B,12.1⟩}, T6 = {⟨A, 8.9⟩, ⟨B, 14.7⟩},T8 = {⟨A, 8.9⟩, ⟨B, 16.7⟩}}
  • 58. Dual Stream Model The Truth about Operations Stateless Operations
  • 59. Dual Stream Model The Truth about Operations Stateful Operations
  • 60. Dual Stream Model The Complete Picture https://www.michael-noll.com/blog/2018/04/05/of-stream- and-tables-in-kafka-and-stream-processing-part1/
  • 61. Dual Stream Model The DSM simplifies the reasoning about the transformations, but does not solve the unboundedness problem. We still need infinite memory for processing an infinite stream. DSM introduces the retention time to make the trade-off explicit. Result Correctness vs Runtime Cost
  • 63. Mapping to Kafka Minimal Concept Partitioned Unbounded Ordering Mutable Unique key constraint Schema Topic Yes Yes Yes No No No (raw bytes) Stream Yes Yes Yes No No Yes Table Yes Yes No Yes Yes Yes Concept Kafka Streams KSQL Java Scala Python Topic - - List/Stream List/Stream[(Array[Byte], Array[Byte])] [] Stream KStream STREAM List/Stream List/Stream[(K, V)] [] Table KTable TABLE HashMap mutable.Map[K, V] {} https://www.michael-noll.com/blog/2018/04/05/of-stream- and-tables-in-kafka-and-stream-processing-part1/
  • 65. Meetup, Tallin First Event April/March By Confluent, 
 so free beers Come and talk about your Kafka experience!