More Related Content Similar to Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING (20) More from Matt Stubbs (20) Big Data LDN 2018: STREAM PROCESSING TAKES ON EVERYTHING2. © 2018 data Artisans2
About Data Artisans
Original Creators of
Apache Flink®
Enterprise Ready
Real Time Stream Processing
3. © 2018 data Artisans3
Stream Processing
Your
Code
one at a time
event processing
...
State
4. © 2018 data Artisans4
Streaming Applications over Time
Real time
Approximate
Analytics
Fraud
Detection
Online
Machine
Learning
Realtime
ETL
Intrusion
Prevention
Financial Risk &
Reporting
Real time
dashboards
Anomaly
Detection
Logistics
Tracking
Recommender
Systems
Web
Applications
Masterdata
Management
Continuous
Processing
Continuous
Processing
and Analytics
Unification of
Analytics and
Applications
Data-driven
Applications
5. © 2018 data Artisans5
Enablers of new Applications
Abstractions,
APIs
Consistency
Event-time, streaming SQL,
state & time, CEP
Exactly-once,
savepoints
Interoperability
Deployment, Connectors,
Operations
Scalability
high parallelism, large
state
Scalable timers,
dynamic scaling,
local recovery, …
Framework and Library,
REST-ified Flink,
SQL Client, …
6. © 2018 data Artisans6
STREAM PROCESSING
TAKES ON ACID
7. © 2018 data Artisans7
Exactly-once Changed Applications
Stateless
Application
K/V Store
CRUD / request/response
Applications
Streaming
Application
State
Stateful Stream
Processing Applications
… become …
8. © 2018 data Artisans8
Some Applications don't move to Stream
Processing
Application
Relational Database
9. © 2018 data Artisans9
Limitation of Current Stream Processors
Transferring money from one account (key) to
another with transactional guarantees is not feasible
The Limitation Example
All stream processors so far can
update a single key-at-a-time
with correctness guarantees
(exactly once)
10. © 2018 data Artisans10
With dA Streaming Ledger you can ...
•… access and update state with multiple keys at the same time
•… maintain full isolation/correctness for the multi-key operations
•… operate on multiple states at the same time
•… share the states between multiple streams
11. © 2018 data Artisans11
‒ Atomicity: the transfer affects
either both accounts or none
‒ Consistency: the transfer must
only happen if the account
have sufficient funds
‒ Isolation: no other operation
can interfere and cause an
incorrect result
‒ Durability: the result of the
transfer is durable
ACID Transactions for Multi-key Stream Processing
Streaming Ledger provides ACID guarantees
across multiple states, rows, and streams
12. © 2018 data Artisans12
Example: Transferring Cash/Assets between
Accounts
13. © 2018 data Artisans13
Example: Position-keeping, Reporting, Risk
Management in Investment Banking
14. © 2018 data Artisans14
A Library on top of Apache Flink
• https://github.com/dataArtisans/da-
streamingledger
• No additional dependencies needed
• Seamlessly integrates and composes with
DataStream API and SQL
• Read from- and write to all Flink
connectors
• Supports savepoints for upgrades
15. © 2018 data Artisans15
STREAM PROCESSING
TAKES ON SQL
16. © 2018 data Artisans16
StreamSQL in Flink
Flink APIs
Stream/Batch Processing
Runtime
Distributed Streaming Data Flow
Java/Scala
17. © 2018 data Artisans17
Flink APIs
Stream/Batch Processing
Runtime
Distributed Streaming Data Flow
Java/Scala SQL
StreamSQL in Flink
● SQL Command Line Client
○ https://github.com/dataArtisans/sql-training
● Event & Processing Time
● Configuration in YAML
● Source/Sink Definition in YAML
● User-defined functions
● Streaming and Batch
18. © 2018 data Artisans18
https://github.com/dataArtisans/sql-training
19. © 2018 data Artisans19
Join
Enrichment Joins
bu
y
bu
y
sell
bu
y
bu
y
sell
$ 17
£ 42
12.5₪
20. © 2018 data Artisans20
Join
Temporal Table Joins
bu
y
bu
y
sell
bu
y
bu
y
sell
1453
31753
14
curr rate time
£ 42 3
£ 12 17
21. © 2018 data Artisans21
SELECT * from ?
Complex Event Processing with SQL
22. © 2018 data Artisans22
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rideTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S M{2,} E)
DEFINE
S AS S.isStart = true,
M AS M.rideId <> S.rideId,
E AS E.isStart = false
AND E.rideId = S.rideId
)
Introducing MATCH_RECOGNIZE
24. © 2018 data Artisans24
Resource Efficient, Scalable, Real-Time Services
based on
The Promise of Stateful Stream Processing
Parallel Computation on Local State
25. © 2018 data Artisans25
Local State leads to Scalability
Stateless
Application
Database
Streaming
Application
State
… become …
● database needs to be
scaled in addition to
application
● in pratice database often
becomes the bottleneck
● compute and storage
are scaled alongside
26. © 2018 data Artisans26
Local State leads to Performance
Stateless
Application
Database
Streaming
Application
State
… become …
● reads/write over tier
boundaries
● local state
● asynchronous writes of
large blobs for durability
27. © 2018 data Artisans27
Local State leads to Simpler Operations
Stateless
Application
Database
Streaming
Application
State
… become …
● additional database
operations
● new service requries
additional database
● only additional backup
storage needed
28. © 2018 data Artisans28
Local State leads to Consistency
Stateless
Application
Database
Streaming
Application
State
… become …
● distributed transactions
● at scale usually low
isolation and consistency
guarantees
● exactly-once
● multi-key multi-table
serializable transaction
29. © 2018 data Artisans29
Resource Efficient, Scalable, Real-Time Services
● Scalability
● Performance
based on
The Promise of Stateful Stream Processing
Parallel Computation on Local State
● Operational Simplicity
● Consistency
32. © 2018 data Artisans32
DATA ARTISANS PLATFORM OVERVIEW
Disclaimer: Apache Flink is not a product of data Artisans. It is a project by the Apache Software Foundation.
33. © 2018 data Artisans33
Performance (early results) - Scalability
(parallelism)
200 million rows
100% update
queries
4 rows
written/query
34. © 2018 data Artisans34
Performance (early results) – Key Contention
Artificial case of
extreme contention:
4 x 200,000
= 800,000 updates/sec
on the same 1,000 keys
Slowdown, but does not
break down like
optimistic concurrency
approaches
100% update
queries
4 written/query