SlideShare a Scribd company logo
Streaming, Database &
Distributed Systems:
Bridging the Divide
Ben Stopford (@benstopford)
Codemesh 2016
Event Driven
Systems
Most stateful systems have to pull
from these three worlds
Today we have 2 goals
1.  Understand Stateful Stream
Processing (now & near future)
2.  Case for SSP as a general framework
for building data-centric systems.
Data systems come in
different forms
•  Database (OLTP)
•  Analytics Database (OLAP/Hadoop)
•  Messaging
•  Distributed log
•  Stream Processing
•  Stateful Stream Processing
Database (OLTP)
Focuses on providing a consistent view that
supports updates and queries on individual tuples.
Analytics Database (OLAP/Hadoop)
1.  Focuses on aggregations via table scans.
2.  Executes as distributed system
Messaging
Focuses on asynchronous information transfer with limited
state
Distributed Log
1.  Similar to messaging, but data can be retained
2.  Executes as distributed system (scale + fault tolerance)
Stream Processing
Manipulate concurrent streams of events
Comes from CEP background (ephemeral)
Stateful Stream Processing
Moves stream processing to be a more general
framework for building data-centric systems.
What is stream processing?
Data
Index
Query
Engine
Query
Engine
vs
Database
Finite source
Stream Processor
Infinite source
Infinite streams need
windows
How many items will we bring into the machine at
one time?
Windows bound a computation
How many items will we bring into the machine at
one time?
Buffering allows us to handle
late events
How many items will we bring into the machine at
one time?
Some query
Over some time window
Emitting at some frequency
Continually executing query
Stream(s)
Stream Processing Engine
Derived Stream
Avg(p.time – o.time)
From orders, payment
Group by payment.region
over 1 day window
emitting every second
Stream Processing
orders!
payments!
Completion time,
by region!
Avg(o.time – p.time)
From orders, payment
Group by payment.region
over 1 day window
emitting every second
Materialised View (DB )
Query
orders!
payments!
Completion time,
by region!
Avg(o.time – p.time)
From orders, payment, user
Group by user.region
over 1 day window
emitting every second
Stateful Stream Processing
Streams
Stream Processing Engine
Derived Stream
Query
Derived “Table”
Table
“View” is output as
table or stream
Table == Stream + Window0
n
== 0 N
Table is a stream with an infinite window (i.e. buffer from 0 -> now)
window !
SSP is about creating
materialised views.
Materialised as a table, or
materialised as a stream
Features: similar to database query
engine
Join Filter
Aggr-
egate
View
Windowed
Streams
Can distribute over many machines
in two dimensions
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Scale Out Scale Forward
Stateful Stream Processing engines typically
use Kafka (a distributed commit log)
Join Filter
Aggr-
egate
View
Kafka (a distributed log)
A log is very simple idea
Messages are added at the end of the log
Just think of the log as a file
Old New
Readers have a position & scan
Sally
is here
George
is here
Fred
is here
Old New
Scan Scan
Scan
Can “Rewind & Replay” the log
Rewind & Replay
Compacted Log
(Tabular View)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version 3
Version 2
Version 1
Version 2
Version 3
Version 5
STEAM
(All versions)
COMPACTED STREAM
(Latest Key only)
The log is a
Distributed System
For scalability and fault tolerance
Shard on the way in
Producers
Kafka
Consumers
Each shard is a queue
Producers
Kafka
Consumers
Producers
Kafka
Many consumers
share partitions
in one topic
Consumers share consumption of a
single topic
The Log reassigns data on failure
Producers
Kafka
Many consumers
share partitions in
one topic
Kafka supplies two levels of
leader election
Replicas in Kafka have
an elected leader
Consumers in Kafka
have an elected leader
The log is important for SSP
Maintains History: Acts like a “push based” distributed file system
The log is important: Two Primitives
Stream
Compacted Stream (‘table’)
The Log is, to a streaming
engine, what HDFS is to Hadoop
But it’s a bit more than a HDFS
replacement: Processors inherit the
idea of “membership” from the log
So stateful Stream Processors use
the Log
Join Filter
Aggr-
egate
View
Kafka (Distributed Log)
They also use local storage
Join Filter
Aggr-
egate
View
(1) a Kafka
(2) Local KV Store
Local KV store has a few uses
(1)  It caches streams on disk
(2) It caches “tables” on disk
Join Filter
Aggr-
egate
View
This makes join operations fast as they’re entirely local
Streams just cache recent
messages to help with joins
Tables are fully
“realised” locally
Stateful Stream Processing
stream
Compacted
stream
Join
Stream data
Stream-Tabular
Data
Infinite
Stream
Locally Cached
Table
(disk resident)
KafkaKafka Streams
e.g. Useful for Enrichment
stream
Compacted
stream
Join
Orders
Customers
KafkaKafka Streams
Local DB
Aggregates need intermediary state
stream
Compacted
stream
Join
Orders
Customers
KafkaSum(orders)
group by region
Persist current value,
in case we fail
State store inherits durability from
the log
State store flushes
back to the log
Join Filter
Aggr-
egate
View
Separate Data, Processing & View
View
OrdersPayments View
View
Storage Layer
(a Kafka)
Processing & View
Query
You can query the views from
anywhere
View
OrdersPayments View
View
Storage Layer
(a Kafka)
Processing & View
Query
So what happens on failure?
View
OrdersPayments View
View
Storage Layer
(a Kafka)
Processing & View
Clustering Reroutes Data to
surviving node
View
OrdersPayments View
View
Storage Layer
(Kafka)
Ownership of partitions is re-routed from dead node
Processing & View
But what about state?
View
OrdersPayments View
View
Storage Layer
(Kafka)
“Cold” replica of state
takes over
Processing & View
Primitives for sharding &
replication
Stock
OrdersPayments Stock
Stock
Redundant copies are
cached on other nodes
Sharding spread data
over processors
So processors inherit much
from the log
Clustering comes
from the log
You just write the
functional bit
General framework for distributed, realtime data
computation
Protection from
broker failure
Protection from
engine failure
Join tables & streams
(in process)
Event Driven
Create views which
can be queried
Query
But stream
processing has a
problem
Correctness Guarantees in multi
layer topologies
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Duplicates are a side effect of all at-least-once delivery mechanisms
Data is rerouted, on failure, which
can cause duplicates
Idempotance isn’t enough
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Filter
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Distributed Snapshots*
(transactions)
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Join Filter
Aggr-
egate
View
Transaction markers:
[Begin], [Prepare], [Commit], [Abort]
Buffer
Chandy, Lamport - Distributed Snapshots: Determining Global States of Distributed Systems
*In development in Kafka
So why use these
tools?
(1) Streaming is a
superset of batch
Databases look backwards
Batch == Streaming from offset 0
Query
Query
Query
Distributed File
System (HDFS)
Query
Query
QueryDistributed Log
(Kafka)
MPP Batch System MPP Streaming System
Streaming is the superset of batch
Streaming
Batch
Database
Global, Linearisible
consistency model
(2) Separates store & view
“Engine” part is lightweight
but stateful
Storage Just a java process
which uses a library
Log handles fault
tolerance of both layers
Separates Concerns of
Model & View – Think MVC
Storage
View & Controller
Model
Physically Separates Read &
Write – Think CQRS
Storage
View & Controller
Model
Database vs SSP
Data
Index
Query
Engine
Query
Engine
vs
Database Stateful Stream Processor
Query
Query
View
Index Data
(3) Decentralised approaches
are more general
Rather than pushing processing
into an “appliance”
(code -> data)
Centralised Processing
App
Data Decentric Architecture
Distributed
Log
Decentralised Processing over many
user-specific views
This more general
than than just
analytics use cases
It’s more than taking a
database and adding push
notifications
Whether you’re building a hulking,
multistage, analytic platform
Query
Final View
Intermediary View (2)
Intermediary View (1)
Or a simple microservice that
needs to run hot-hot & scale
Business Logic
Manage local
state
Join various
streams
Hot secondary
instance
Composable Primatives
Declarative
Function
Traditional DB
Work
Distribution
Replication
Sharding
Query
Engine
Distributed DB Distributed Systems
Membership
Global
Consistency
General framework for distributed, event-
driven data computation
Protection from
broker failure
Protection from
engine failure
Join tables & streams
(in process)
Event Driven
Create views which
can be queried
Query
Stateful Stream Processing
Framework for building a streaming data
systems, just for you “~)
Find out more:
•  http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
•  https://martin.kleppmann.com/2015/02/11/database-inside-out-at-salesforce.html
•  http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf
•  https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cidr07p42.pdf
•  http://highscalability.com/blog/2015/5/4/elements-of-scale-composing-and-scaling-data-
platforms.html
•  https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-
kafka-streams
•  https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html
•  https://www.madewithtea.com/processing-tweets-with-kafka-streams.html
•  http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/
•  http://www.slideshare.net/zacharycox/updating-materialized-views-and-caches-using-kafka
The end
@benstopford
http://benstopford.com

More Related Content

What's hot

Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
Taro L. Saito
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
spark-project
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
Databricks
 
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
Gruter
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
Suneel Marthi
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 

What's hot (20)

Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
Flink Forward Berlin 2017: Aris Kyriakos Koliopoulos - Drivetribe's Kappa Arc...
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Some Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdfSome Iceberg Basics for Beginners (CDP).pdf
Some Iceberg Basics for Beginners (CDP).pdf
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWSTajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 

Viewers also liked

Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
Ben Stopford
 
JAX London Slides
JAX London SlidesJAX London Slides
JAX London Slides
Ben Stopford
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
Ben Stopford
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
Ben Stopford
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojureBen Stopford
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?Ben Stopford
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
Ben Stopford
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
Brendan Gregg
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Ben Stopford
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
Ben Stopford
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
Ben Stopford
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
confluent
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
confluent
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
Ben Stopford
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
Ben Stopford
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Michael Noll
 
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processingApache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther
 

Viewers also liked (20)

Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
JAX London Slides
JAX London SlidesJAX London Slides
JAX London Slides
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 
The return of big iron?
The return of big iron?The return of big iron?
The return of big iron?
 
Big Data & the Enterprise
Big Data & the EnterpriseBig Data & the Enterprise
Big Data & the Enterprise
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
 
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processingApache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
 

Similar to Streaming, Database & Distributed Systems Bridging the Divide

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
Ben Stopford
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Dibyendu Bhattacharya
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
Ben Stopford
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Alluxio, Inc.
 
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
Gabriele Modena
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
Palani Kumar
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
Richard Claassens CIPPE
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
nadine39280
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Databricks
 

Similar to Streaming, Database & Distributed Systems Bridging the Divide (20)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizaçþes de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 

More from Ben Stopford

The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
Ben Stopford
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
Ben Stopford
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
Ben Stopford
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Ben Stopford
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
Ben Stopford
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
Ben Stopford
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Ben Stopford
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
Ben Stopford
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Ben Stopford
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
Ben Stopford
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Ben Stopford
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopfordBen Stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
Ben Stopford
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
Ben Stopford
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
Architecting for Change: An Agile Approach
Architecting for Change: An Agile ApproachArchitecting for Change: An Agile Approach
Architecting for Change: An Agile Approach
Ben Stopford
 

More from Ben Stopford (17)

The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Architecting for Change: An Agile Approach
Architecting for Change: An Agile ApproachArchitecting for Change: An Agile Approach
Architecting for Change: An Agile Approach
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 

Streaming, Database & Distributed Systems Bridging the Divide

  • 1. Streaming, Database & Distributed Systems: Bridging the Divide Ben Stopford (@benstopford) Codemesh 2016
  • 2.
  • 3. Event Driven Systems Most stateful systems have to pull from these three worlds
  • 4. Today we have 2 goals 1.  Understand Stateful Stream Processing (now & near future) 2.  Case for SSP as a general framework for building data-centric systems.
  • 5. Data systems come in different forms •  Database (OLTP) •  Analytics Database (OLAP/Hadoop) •  Messaging •  Distributed log •  Stream Processing •  Stateful Stream Processing
  • 6. Database (OLTP) Focuses on providing a consistent view that supports updates and queries on individual tuples.
  • 7. Analytics Database (OLAP/Hadoop) 1.  Focuses on aggregations via table scans. 2.  Executes as distributed system
  • 8. Messaging Focuses on asynchronous information transfer with limited state
  • 9. Distributed Log 1.  Similar to messaging, but data can be retained 2.  Executes as distributed system (scale + fault tolerance)
  • 10. Stream Processing Manipulate concurrent streams of events Comes from CEP background (ephemeral)
  • 11. Stateful Stream Processing Moves stream processing to be a more general framework for building data-centric systems.
  • 12. What is stream processing? Data Index Query Engine Query Engine vs Database Finite source Stream Processor Infinite source
  • 13. Infinite streams need windows How many items will we bring into the machine at one time?
  • 14. Windows bound a computation How many items will we bring into the machine at one time?
  • 15. Buffering allows us to handle late events How many items will we bring into the machine at one time?
  • 16. Some query Over some time window Emitting at some frequency Continually executing query Stream(s) Stream Processing Engine Derived Stream
  • 17. Avg(p.time – o.time) From orders, payment Group by payment.region over 1 day window emitting every second Stream Processing orders! payments! Completion time, by region!
  • 18. Avg(o.time – p.time) From orders, payment Group by payment.region over 1 day window emitting every second Materialised View (DB ) Query orders! payments! Completion time, by region!
  • 19. Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second Stateful Stream Processing Streams Stream Processing Engine Derived Stream Query Derived “Table” Table “View” is output as table or stream
  • 20. Table == Stream + Window0 n == 0 N Table is a stream with an infinite window (i.e. buffer from 0 -> now) window !
  • 21. SSP is about creating materialised views. Materialised as a table, or materialised as a stream
  • 22. Features: similar to database query engine Join Filter Aggr- egate View Windowed Streams
  • 23. Can distribute over many machines in two dimensions Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View Scale Out Scale Forward
  • 24. Stateful Stream Processing engines typically use Kafka (a distributed commit log) Join Filter Aggr- egate View Kafka (a distributed log)
  • 25. A log is very simple idea Messages are added at the end of the log Just think of the log as a file Old New
  • 26. Readers have a position & scan Sally is here George is here Fred is here Old New Scan Scan Scan
  • 27. Can “Rewind & Replay” the log Rewind & Replay
  • 28. Compacted Log (Tabular View) Version 3 Version 2 Version 1 Version 2 Version 1 Version 5 Version 4 Version 3 Version 2 Version 1 Version 2 Version 3 Version 5 STEAM (All versions) COMPACTED STREAM (Latest Key only)
  • 29. The log is a Distributed System For scalability and fault tolerance
  • 30. Shard on the way in Producers Kafka Consumers
  • 31. Each shard is a queue Producers Kafka Consumers
  • 32. Producers Kafka Many consumers share partitions in one topic Consumers share consumption of a single topic
  • 33. The Log reassigns data on failure Producers Kafka Many consumers share partitions in one topic
  • 34. Kafka supplies two levels of leader election Replicas in Kafka have an elected leader Consumers in Kafka have an elected leader
  • 35. The log is important for SSP Maintains History: Acts like a “push based” distributed file system
  • 36. The log is important: Two Primitives Stream Compacted Stream (‘table’)
  • 37. The Log is, to a streaming engine, what HDFS is to Hadoop
  • 38. But it’s a bit more than a HDFS replacement: Processors inherit the idea of “membership” from the log
  • 39. So stateful Stream Processors use the Log Join Filter Aggr- egate View Kafka (Distributed Log)
  • 40. They also use local storage Join Filter Aggr- egate View (1) a Kafka (2) Local KV Store
  • 41. Local KV store has a few uses (1)  It caches streams on disk (2) It caches “tables” on disk Join Filter Aggr- egate View This makes join operations fast as they’re entirely local Streams just cache recent messages to help with joins Tables are fully “realised” locally
  • 42. Stateful Stream Processing stream Compacted stream Join Stream data Stream-Tabular Data Infinite Stream Locally Cached Table (disk resident) KafkaKafka Streams
  • 43. e.g. Useful for Enrichment stream Compacted stream Join Orders Customers KafkaKafka Streams Local DB
  • 44. Aggregates need intermediary state stream Compacted stream Join Orders Customers KafkaSum(orders) group by region Persist current value, in case we fail
  • 45. State store inherits durability from the log State store flushes back to the log Join Filter Aggr- egate View
  • 46. Separate Data, Processing & View View OrdersPayments View View Storage Layer (a Kafka) Processing & View Query
  • 47. You can query the views from anywhere View OrdersPayments View View Storage Layer (a Kafka) Processing & View Query
  • 48. So what happens on failure? View OrdersPayments View View Storage Layer (a Kafka) Processing & View
  • 49. Clustering Reroutes Data to surviving node View OrdersPayments View View Storage Layer (Kafka) Ownership of partitions is re-routed from dead node Processing & View
  • 50. But what about state? View OrdersPayments View View Storage Layer (Kafka) “Cold” replica of state takes over Processing & View
  • 51. Primitives for sharding & replication Stock OrdersPayments Stock Stock Redundant copies are cached on other nodes Sharding spread data over processors
  • 52. So processors inherit much from the log Clustering comes from the log You just write the functional bit
  • 53. General framework for distributed, realtime data computation Protection from broker failure Protection from engine failure Join tables & streams (in process) Event Driven Create views which can be queried Query
  • 55. Correctness Guarantees in multi layer topologies Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View
  • 56. Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View Duplicates are a side effect of all at-least-once delivery mechanisms Data is rerouted, on failure, which can cause duplicates
  • 57. Idempotance isn’t enough Join Filter Aggr- egate View Join Filter Aggr- egate View Filter Join Filter Aggr- egate View Join Filter Aggr- egate View
  • 58. Distributed Snapshots* (transactions) Join Filter Aggr- egate View Join Filter Aggr- egate View Join Filter Aggr- egate View Transaction markers: [Begin], [Prepare], [Commit], [Abort] Buffer Chandy, Lamport - Distributed Snapshots: Determining Global States of Distributed Systems *In development in Kafka
  • 59.
  • 60.
  • 61. So why use these tools?
  • 62. (1) Streaming is a superset of batch
  • 64. Batch == Streaming from offset 0 Query Query Query Distributed File System (HDFS) Query Query QueryDistributed Log (Kafka) MPP Batch System MPP Streaming System
  • 65. Streaming is the superset of batch Streaming Batch Database Global, Linearisible consistency model
  • 67. “Engine” part is lightweight but stateful Storage Just a java process which uses a library Log handles fault tolerance of both layers
  • 68. Separates Concerns of Model & View – Think MVC Storage View & Controller Model
  • 69. Physically Separates Read & Write – Think CQRS Storage View & Controller Model
  • 70. Database vs SSP Data Index Query Engine Query Engine vs Database Stateful Stream Processor Query Query View Index Data
  • 72. Rather than pushing processing into an “appliance” (code -> data) Centralised Processing App
  • 73. Data Decentric Architecture Distributed Log Decentralised Processing over many user-specific views
  • 74. This more general than than just analytics use cases
  • 75. It’s more than taking a database and adding push notifications
  • 76. Whether you’re building a hulking, multistage, analytic platform Query Final View Intermediary View (2) Intermediary View (1)
  • 77. Or a simple microservice that needs to run hot-hot & scale Business Logic Manage local state Join various streams Hot secondary instance
  • 79. General framework for distributed, event- driven data computation Protection from broker failure Protection from engine failure Join tables & streams (in process) Event Driven Create views which can be queried Query
  • 80. Stateful Stream Processing Framework for building a streaming data systems, just for you “~)
  • 81. Find out more: •  http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ •  https://martin.kleppmann.com/2015/02/11/database-inside-out-at-salesforce.html •  http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf •  https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/cidr07p42.pdf •  http://highscalability.com/blog/2015/5/4/elements-of-scale-composing-and-scaling-data- platforms.html •  https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with- kafka-streams •  https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html •  https://www.madewithtea.com/processing-tweets-with-kafka-streams.html •  http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ •  http://www.slideshare.net/zacharycox/updating-materialized-views-and-caches-using-kafka