SlideShare a Scribd company logo
Processing Semantically-Ordered
Streams in Financial Services
Addressing the Lack of Ordering Guarantees in Stream Processing
ATOMICWIRE.IO/FLINK-FORWARD-2022
3 August 2022 © 2022 Atomic Wire Technology Limited
Introduction
2
Patrick Lucas
• Previously team lead and software engineer at Yelp.
• Moved to Berlin in 2017 to join data Artisans (now Ververica).
• An original team member developing Ververica Platform.
• Co-founder at Atomic Wire, focusing on stream processing in
financial services.
Atomic Wire
• Originally spun-out of a large Swiss investment bank.
• 100% focused on the financial services vertical.
• Currently building cloud-native post-trade services for Tier 1 banks.
• A key challenge we’ve encountered is achieving in-order processing
in Flink and Beam, especially when consuming sequenced streams.
3 August 2022 © 2022 Atomic Wire Technology Limited
Problem statement
Today’s stream processing frameworks are fundamentally about analytics, using millisecond-resolution
timestamps for bucketing events (e.g. into a time slice). In-order processing can be achieved, but it is
awkward and slow.
3
Currently Provided
by Stream Processors
Primary Application Focus
Typical Requirements
in Financial Services
Analytics: dividing, transforming, and
recombining continuously arriving data
Critical-path applications, workflows,
observing state changes and lifecycles
Guarantees Ordering within a topic / channel Global ordering across topics / channels
Latency Sub-second (>100ms) Sub-millisecond (<10μs)
Data Streams
Real-Time / Batch
Change Data Capture (CDC) 1 Sequenced Streams 2
Ordering Mechanism
Time-Based
(Event Time, Processing Time)
Semantically-Ordered
1 CDC streams are typically ordered according to the sequence in which transactions are committed to a database. This is similar to the concept of a sequenced stream, insofar as the ordering of events in the
commit log enables downstream applications to replicate state. However, a database does not guarantee the order in which transaction requests are committed. In fact, there are many reasons why
transaction requests may be reordered - for example to optimise performance, or when rejected transactions are retried. This is somewhat different to the intent of a sequenced stream, where the order is
intended to represent some notion of causality with respect to the order of events in some real-world business process.
2 A sequenced stream represents a set of events that have a total global order (see following slide).
3 August 2022 © 2022 Atomic Wire Technology Limited
What are “sequenced streams”?
A sequenced stream represents a set of events that have a total global order, and where preserving
causal / deterministic ordering really matters for correct in-order processing
4
App 1
SEQUENCER
App 2
App n
Unsequenced
App Outputs
External Inputs & Outputs
(e.g. Client Orders, Market Data)
Sequenced streams
• A common architectural pattern with origins going back
to Lamport scalar clocks: 2
○ Every input is assigned a globally unique
monotonic sequence number and timestamp by a
central component known as a sequencer.
○ The stream is disseminated to all nodes /
applications in the system, which only operate on
these sequenced inputs and never on other
external inputs that have not been sequenced.
○ All nodes can have a perfectly synchronised state
by virtue of processing identical inputs in identical
order.
How is this different to Kafka?
• The primary difference is that topic- or channel-based
middleware do not maintain the relative ordering of
messages across topics or channels.
• You can think of a sequencer as an extremely fast
single-topic broker with persistence and at-least-once
delivery semantics.
1 P. Sanghvi, Proof Engineering: The Algorithmic Trading Platform, June 2021.
2 L. Lamport, Time, clocks, and the ordering of events in a distributed system, July 1978.
THE SEQUENCED STREAM 1
3 August 2022 © 2022 Atomic Wire Technology Limited
What precisely does “in-order processing” mean?
… and how can it be achieved in Flink?
5
In pseudocode:
• This actually works in Flink with caveats if your data arrives already in order, but it’s not particularly
satisfying with regard to the guarantees and API the framework provides—we’ll come back to this later.
We want to (a) define a stateful callback that is invoked for each incoming event for a particular key,
and (b) guarantee the callback is invoked on events in the correct order, however that order is defined
3 August 2022 © 2022 Atomic Wire Technology Limited
Achieving in-order processing with Flink
Case study: Flink CEP
6
• Flink CEP (short for “complex event processing”) is a library for identifying patterns in streams of events,
somewhat like a regular expression (e.g. this one or more times followed by that).
• It clearly needs to be able to process events in order—but which order?
• So, in Flink CEP, the ordering is their time-sequenced order, with processing driven by watermarks.
• This is implemented using ProcessFunction, a low-level user function in Flink that gives you access to
state and timers, which are needed to accomplish this.
… in CEP the order in which elements are processed matters. To guarantee that elements
are processed in the correct order when working in event time, an incoming element is initially
put in a buffer where elements are sorted in ascending order based on their timestamp, and
when a watermark arrives, all the elements in this buffer with timestamps smaller than that of
the watermark are processed. This implies that elements between watermarks are processed
in event-time order.
“
”
3 August 2022 © 2022 Atomic Wire Technology Limited
Achieving in-order processing with Flink
Time-ordering in Flink CEP
7
t5
Stream (out of order)
t1 t3 t4 t2 t5
First
event
Event timestamp
Flink CEP
6 5 4 3 2 1
t1 5
t2 2
t3 4
t4 3
t5 1, 6
5 , 2 , 4 , 3 , 1 , 6
tn : event time
x : event ID
Processing in event time order
Processed first
3 August 2022 © 2022 Atomic Wire Technology Limited
Achieving in-order processing with Flink
Pseudocode example with ProcessFunction
1
2
3
4
5
• Receive events and store in
MapState<Timestamp, List<Event>>
• Set timer @ timestamp
• When timestamp fires, read events
and clear state
• Sort them again by timestamp (for
sub-millisecond resolution)
• Invoke callback
1
2
3
4
5
3 August 2022 © 2022 Atomic Wire Technology Limited
Achieving in-order processing with Flink
Problems using ProcessFunction
9
• A downside of this approach is that the order it processes in can only be expressed through the
(millisecond-resolution) event timestamps.
Order-preserving processing would obviate this expensive approach to reordering,
and decouple the order of processing from the event timestamps
i
You have to trust that your
timestamps actually represent
the correct order.
ii
You need additional information
on each event in case you ever
have two events that occurred
within the same millisecond.
iii
You are setting ~1 timer per
input event, and each input
event has to be stored in state,
which is expensive.
3 August 2022 © 2022 Atomic Wire Technology Limited
Circuit switching vs. packet switching
Can we achieve in-order processing without Flink CEP-style time ordering?
10
Dataflow w/ Beam
Packet Switching
Bundle distribution
(bundles can be reordered)
Packets
• A central broker distributes bundles of work to a pool of
workers, and the bundles can be reordered arbitrarily
between any two PTransforms.
• In practice, sub-pipelines without a shuffle often get
joined, called “fusion” in Dataflow or “chaining” in Flink,
but while data doesn’t get reordered in those cases, it’s
not guaranteed.
Flink
Circuit Switching
Serial channels
(messages are not reordered within a channel)
Circuits
• Fixed, preallocated network channels which mean data
sharded in a particular way cannot be reordered as it
flows from task to task.
Task 1.1 Task 1.2
Task 1.1 Task 1.2
Broker
Worker 2
Worker 1 Worker 3
3 August 2022 © 2022 Atomic Wire Technology Limited
What if my data is already in order?
Processing data from order-preserving sources
11
What if our data source preserves ordering, and that original ordering is the
precise order that we want to process in?
Is this possible today?
Flink Dataflow w/ Beam
… implicitly possible today in Flink –
due to “circuit switching”, events will
not get reordered and user code
callbacks will fire in the original order.
… not possible today in Dataflow – due
to “packet switching”, you can never
rely on the ordering of your input data
to be preserved.
Behaviour is implicit due to an
implementation detail of Flink, and is
not a guarantee built into the
public API layer.
No guarantees.
3 August 2022 © 2022 Atomic Wire Technology Limited
Achieving in-order processing in Dataflow
Back to CEP-style reordering
12
Experimenting with Beam on Dataflow
• We ran experiments using both BagState and a single timer as well as MapState and a TimerMap
• Certainly possible to get the expected reordering behavior, but far from ideal:
○ Dataflow reserves the right to reorder at every DoFn boundary.
○ So, it only works assuredly when performing all processing within the reordering code itself—or by
assuming Dataflow will always fuse your reordering step with your processing step.
○ This is not good Beam application design, where all processing is meant to be expressed by chaining
or composing PTransforms.
Dealbreaker: Extremely high latency
When testing with Google Pub/Sub as the source, we experienced delays of 15 to 75 seconds
TL;DR – it is possible to implement essentially the same approach as Flink CEP in Beam,
but it’s too slow for most use cases
3 August 2022 © 2022 Atomic Wire Technology Limited
Where do we go from here?
How to improve stream processing frameworks for in-order processing
13
01
When using an order-preserving source
like Pulsar or Kafka, there should be an
explicit option to preserve the source
ordering for processing and to
configure the topic sharding key to use.
02
When the input data follows the
sequenced stream pattern (using
monotonically-increasing, gapless
sequence IDs) there should be an
explicit option to indicate this to the
framework such that the expressed
sequence can be reconstructed and
preserved during processing within the
streaming application.
i Distributed processing
ii Horizontal scaling
iii Failure recovery
iv State management
.. … etc
+
We still want all the great features
we get today from frameworks like
Flink …
… but the frameworks need an explicit option
to preserve the order of processing within each
key of a keyed stream.
3 August 2022 © 2022 Atomic Wire Technology Limited
Where do we go from here?
Preserving source ordering
14
1 5 1 3 3
f e d c b 0 :
1 :
2 :
3 :
4 :
a : event ID
K : sharding key
processing
order
ID:
Key: 1
a
0 4 0 2 4
l k j i h
ID:
Key: 4
g
5 :
j , l
a , d , f
i
b , c
g , h , k
e
Order of processing within each
key matches source
First event in
each partition
3 August 2022 © 2022 Atomic Wire Technology Limited
Where do we go from here?
Processing sequenced streams
15
a b a b a
11 14 9 4 0
a :
b :
b a b a b
12 7 8 2 1 0 , 2 , 3 , 6 , 7 , 9 , 11 , 13
1 , 4 , 5 , 8 , 10 , 12 , 14
a b a b a
13 10 6 5 3
Processing held
back to wait for
out-of-order data
n : sequence ID
a : financial instrument
Order of processing within each
key matches sequence ID order
3 August 2022 © 2022 Atomic Wire Technology Limited
Call for input
What do you think? How have you solved this?
16
A
Is it possible to build in-order
processing guarantees with Flink? B
Or is the notion of time too ingrained
to accomplish this without cutting all
the way to the core engine?
We’d like to hear your views on this and learn about your
solutions or workaround to this problem.
We’re Hiring!
Stream Processing for Financial Services
ATOMICWIRE.IO/CONTACT

More Related Content

What's hot

Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
confluent
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Ververica
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
Aljoscha Krettek
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
Yaroslav Tkachenko
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
DataWorks Summit
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 

What's hot (20)

Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
 
Storing State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your AnalyticsStoring State Forever: Why It Can Be Good For Your Analytics
Storing State Forever: Why It Can Be Good For Your Analytics
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 

Similar to Processing Semantically-Ordered Streams in Financial Services

Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseTelecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Neo4j
 
Enabling Cloud Storage Auditing with Key Exposure Resistance
Enabling Cloud Storage Auditing with Key Exposure ResistanceEnabling Cloud Storage Auditing with Key Exposure Resistance
Enabling Cloud Storage Auditing with Key Exposure Resistance
IRJET Journal
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
Matt Lucas
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
Deepak Shankar
 
Hlb private cloud rules of engagement idc
Hlb private cloud rules of engagement   idcHlb private cloud rules of engagement   idc
Hlb private cloud rules of engagement idc
Yew Jin Kang
 
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key AbstractionEnhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
paperpublications3
 
SMTAI PowerPoint: Blockchain for High Tech
SMTAI PowerPoint: Blockchain for High Tech SMTAI PowerPoint: Blockchain for High Tech
SMTAI PowerPoint: Blockchain for High Tech
Quentin Samelson
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
makker_nl
 
IT Problems & Problem Management
IT Problems & Problem ManagementIT Problems & Problem Management
IT Problems & Problem Management
Apalytics
 
IT Performance Problems
IT Performance Problems IT Performance Problems
IT Performance Problems
Apalytics
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
HostedbyConfluent
 
SECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTION
SECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTIONSECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTION
SECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTION
IRJET Journal
 
Full Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its Applications
Cassandra Austin
 
Encode Club workshop slides
Encode Club workshop slidesEncode Club workshop slides
Encode Club workshop slides
Vanessa Lošić
 
Network Time Synchronization
Network Time SynchronizationNetwork Time Synchronization
Network Time Synchronization
Ben Rothke
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
Peter Broadhurst
 
Spark Streaming Early Warning Use Case
Spark Streaming Early Warning Use CaseSpark Streaming Early Warning Use Case
Spark Streaming Early Warning Use Case
random_chance
 
SECURE AUDITING AND DEDUPLICATING DATA IN CLOUD
SECURE AUDITING AND DEDUPLICATING DATA IN CLOUDSECURE AUDITING AND DEDUPLICATING DATA IN CLOUD
SECURE AUDITING AND DEDUPLICATING DATA IN CLOUD
Nexgen Technology
 

Similar to Processing Semantically-Ordered Streams in Financial Services (20)

Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph DatabaseTelecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
Telecoms Service Assurance & Service Fulfillment with Neo4j Graph Database
 
Enabling Cloud Storage Auditing with Key Exposure Resistance
Enabling Cloud Storage Auditing with Key Exposure ResistanceEnabling Cloud Storage Auditing with Key Exposure Resistance
Enabling Cloud Storage Auditing with Key Exposure Resistance
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
 
Hlb private cloud rules of engagement idc
Hlb private cloud rules of engagement   idcHlb private cloud rules of engagement   idc
Hlb private cloud rules of engagement idc
 
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key AbstractionEnhancing Data Security in Cloud Storage Auditing With Key Abstraction
Enhancing Data Security in Cloud Storage Auditing With Key Abstraction
 
SMTAI PowerPoint: Blockchain for High Tech
SMTAI PowerPoint: Blockchain for High Tech SMTAI PowerPoint: Blockchain for High Tech
SMTAI PowerPoint: Blockchain for High Tech
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
... No it's Apache Kafka!
... No it's Apache Kafka!... No it's Apache Kafka!
... No it's Apache Kafka!
 
IT Problems & Problem Management
IT Problems & Problem ManagementIT Problems & Problem Management
IT Problems & Problem Management
 
IT Performance Problems
IT Performance Problems IT Performance Problems
IT Performance Problems
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
SECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTION
SECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTIONSECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTION
SECURE FILE STORAGE IN THE CLOUD WITH HYBRID ENCRYPTION
 
Full Consistency Lag and its Applications
Full Consistency Lag and its ApplicationsFull Consistency Lag and its Applications
Full Consistency Lag and its Applications
 
Encode Club workshop slides
Encode Club workshop slidesEncode Club workshop slides
Encode Club workshop slides
 
Network Time Synchronization
Network Time SynchronizationNetwork Time Synchronization
Network Time Synchronization
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
 
Spark Streaming Early Warning Use Case
Spark Streaming Early Warning Use CaseSpark Streaming Early Warning Use Case
Spark Streaming Early Warning Use Case
 
SECURE AUDITING AND DEDUPLICATING DATA IN CLOUD
SECURE AUDITING AND DEDUPLICATING DATA IN CLOUDSECURE AUDITING AND DEDUPLICATING DATA IN CLOUD
SECURE AUDITING AND DEDUPLICATING DATA IN CLOUD
 

More from Flink Forward

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 

More from Flink Forward (15)

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 

Recently uploaded

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 

Recently uploaded (20)

The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 

Processing Semantically-Ordered Streams in Financial Services

  • 1. Processing Semantically-Ordered Streams in Financial Services Addressing the Lack of Ordering Guarantees in Stream Processing ATOMICWIRE.IO/FLINK-FORWARD-2022
  • 2. 3 August 2022 © 2022 Atomic Wire Technology Limited Introduction 2 Patrick Lucas • Previously team lead and software engineer at Yelp. • Moved to Berlin in 2017 to join data Artisans (now Ververica). • An original team member developing Ververica Platform. • Co-founder at Atomic Wire, focusing on stream processing in financial services. Atomic Wire • Originally spun-out of a large Swiss investment bank. • 100% focused on the financial services vertical. • Currently building cloud-native post-trade services for Tier 1 banks. • A key challenge we’ve encountered is achieving in-order processing in Flink and Beam, especially when consuming sequenced streams.
  • 3. 3 August 2022 © 2022 Atomic Wire Technology Limited Problem statement Today’s stream processing frameworks are fundamentally about analytics, using millisecond-resolution timestamps for bucketing events (e.g. into a time slice). In-order processing can be achieved, but it is awkward and slow. 3 Currently Provided by Stream Processors Primary Application Focus Typical Requirements in Financial Services Analytics: dividing, transforming, and recombining continuously arriving data Critical-path applications, workflows, observing state changes and lifecycles Guarantees Ordering within a topic / channel Global ordering across topics / channels Latency Sub-second (>100ms) Sub-millisecond (<10μs) Data Streams Real-Time / Batch Change Data Capture (CDC) 1 Sequenced Streams 2 Ordering Mechanism Time-Based (Event Time, Processing Time) Semantically-Ordered 1 CDC streams are typically ordered according to the sequence in which transactions are committed to a database. This is similar to the concept of a sequenced stream, insofar as the ordering of events in the commit log enables downstream applications to replicate state. However, a database does not guarantee the order in which transaction requests are committed. In fact, there are many reasons why transaction requests may be reordered - for example to optimise performance, or when rejected transactions are retried. This is somewhat different to the intent of a sequenced stream, where the order is intended to represent some notion of causality with respect to the order of events in some real-world business process. 2 A sequenced stream represents a set of events that have a total global order (see following slide).
  • 4. 3 August 2022 © 2022 Atomic Wire Technology Limited What are “sequenced streams”? A sequenced stream represents a set of events that have a total global order, and where preserving causal / deterministic ordering really matters for correct in-order processing 4 App 1 SEQUENCER App 2 App n Unsequenced App Outputs External Inputs & Outputs (e.g. Client Orders, Market Data) Sequenced streams • A common architectural pattern with origins going back to Lamport scalar clocks: 2 ○ Every input is assigned a globally unique monotonic sequence number and timestamp by a central component known as a sequencer. ○ The stream is disseminated to all nodes / applications in the system, which only operate on these sequenced inputs and never on other external inputs that have not been sequenced. ○ All nodes can have a perfectly synchronised state by virtue of processing identical inputs in identical order. How is this different to Kafka? • The primary difference is that topic- or channel-based middleware do not maintain the relative ordering of messages across topics or channels. • You can think of a sequencer as an extremely fast single-topic broker with persistence and at-least-once delivery semantics. 1 P. Sanghvi, Proof Engineering: The Algorithmic Trading Platform, June 2021. 2 L. Lamport, Time, clocks, and the ordering of events in a distributed system, July 1978. THE SEQUENCED STREAM 1
  • 5. 3 August 2022 © 2022 Atomic Wire Technology Limited What precisely does “in-order processing” mean? … and how can it be achieved in Flink? 5 In pseudocode: • This actually works in Flink with caveats if your data arrives already in order, but it’s not particularly satisfying with regard to the guarantees and API the framework provides—we’ll come back to this later. We want to (a) define a stateful callback that is invoked for each incoming event for a particular key, and (b) guarantee the callback is invoked on events in the correct order, however that order is defined
  • 6. 3 August 2022 © 2022 Atomic Wire Technology Limited Achieving in-order processing with Flink Case study: Flink CEP 6 • Flink CEP (short for “complex event processing”) is a library for identifying patterns in streams of events, somewhat like a regular expression (e.g. this one or more times followed by that). • It clearly needs to be able to process events in order—but which order? • So, in Flink CEP, the ordering is their time-sequenced order, with processing driven by watermarks. • This is implemented using ProcessFunction, a low-level user function in Flink that gives you access to state and timers, which are needed to accomplish this. … in CEP the order in which elements are processed matters. To guarantee that elements are processed in the correct order when working in event time, an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed. This implies that elements between watermarks are processed in event-time order. “ ”
  • 7. 3 August 2022 © 2022 Atomic Wire Technology Limited Achieving in-order processing with Flink Time-ordering in Flink CEP 7 t5 Stream (out of order) t1 t3 t4 t2 t5 First event Event timestamp Flink CEP 6 5 4 3 2 1 t1 5 t2 2 t3 4 t4 3 t5 1, 6 5 , 2 , 4 , 3 , 1 , 6 tn : event time x : event ID Processing in event time order Processed first
  • 8. 3 August 2022 © 2022 Atomic Wire Technology Limited Achieving in-order processing with Flink Pseudocode example with ProcessFunction 1 2 3 4 5 • Receive events and store in MapState<Timestamp, List<Event>> • Set timer @ timestamp • When timestamp fires, read events and clear state • Sort them again by timestamp (for sub-millisecond resolution) • Invoke callback 1 2 3 4 5
  • 9. 3 August 2022 © 2022 Atomic Wire Technology Limited Achieving in-order processing with Flink Problems using ProcessFunction 9 • A downside of this approach is that the order it processes in can only be expressed through the (millisecond-resolution) event timestamps. Order-preserving processing would obviate this expensive approach to reordering, and decouple the order of processing from the event timestamps i You have to trust that your timestamps actually represent the correct order. ii You need additional information on each event in case you ever have two events that occurred within the same millisecond. iii You are setting ~1 timer per input event, and each input event has to be stored in state, which is expensive.
  • 10. 3 August 2022 © 2022 Atomic Wire Technology Limited Circuit switching vs. packet switching Can we achieve in-order processing without Flink CEP-style time ordering? 10 Dataflow w/ Beam Packet Switching Bundle distribution (bundles can be reordered) Packets • A central broker distributes bundles of work to a pool of workers, and the bundles can be reordered arbitrarily between any two PTransforms. • In practice, sub-pipelines without a shuffle often get joined, called “fusion” in Dataflow or “chaining” in Flink, but while data doesn’t get reordered in those cases, it’s not guaranteed. Flink Circuit Switching Serial channels (messages are not reordered within a channel) Circuits • Fixed, preallocated network channels which mean data sharded in a particular way cannot be reordered as it flows from task to task. Task 1.1 Task 1.2 Task 1.1 Task 1.2 Broker Worker 2 Worker 1 Worker 3
  • 11. 3 August 2022 © 2022 Atomic Wire Technology Limited What if my data is already in order? Processing data from order-preserving sources 11 What if our data source preserves ordering, and that original ordering is the precise order that we want to process in? Is this possible today? Flink Dataflow w/ Beam … implicitly possible today in Flink – due to “circuit switching”, events will not get reordered and user code callbacks will fire in the original order. … not possible today in Dataflow – due to “packet switching”, you can never rely on the ordering of your input data to be preserved. Behaviour is implicit due to an implementation detail of Flink, and is not a guarantee built into the public API layer. No guarantees.
  • 12. 3 August 2022 © 2022 Atomic Wire Technology Limited Achieving in-order processing in Dataflow Back to CEP-style reordering 12 Experimenting with Beam on Dataflow • We ran experiments using both BagState and a single timer as well as MapState and a TimerMap • Certainly possible to get the expected reordering behavior, but far from ideal: ○ Dataflow reserves the right to reorder at every DoFn boundary. ○ So, it only works assuredly when performing all processing within the reordering code itself—or by assuming Dataflow will always fuse your reordering step with your processing step. ○ This is not good Beam application design, where all processing is meant to be expressed by chaining or composing PTransforms. Dealbreaker: Extremely high latency When testing with Google Pub/Sub as the source, we experienced delays of 15 to 75 seconds TL;DR – it is possible to implement essentially the same approach as Flink CEP in Beam, but it’s too slow for most use cases
  • 13. 3 August 2022 © 2022 Atomic Wire Technology Limited Where do we go from here? How to improve stream processing frameworks for in-order processing 13 01 When using an order-preserving source like Pulsar or Kafka, there should be an explicit option to preserve the source ordering for processing and to configure the topic sharding key to use. 02 When the input data follows the sequenced stream pattern (using monotonically-increasing, gapless sequence IDs) there should be an explicit option to indicate this to the framework such that the expressed sequence can be reconstructed and preserved during processing within the streaming application. i Distributed processing ii Horizontal scaling iii Failure recovery iv State management .. … etc + We still want all the great features we get today from frameworks like Flink … … but the frameworks need an explicit option to preserve the order of processing within each key of a keyed stream.
  • 14. 3 August 2022 © 2022 Atomic Wire Technology Limited Where do we go from here? Preserving source ordering 14 1 5 1 3 3 f e d c b 0 : 1 : 2 : 3 : 4 : a : event ID K : sharding key processing order ID: Key: 1 a 0 4 0 2 4 l k j i h ID: Key: 4 g 5 : j , l a , d , f i b , c g , h , k e Order of processing within each key matches source First event in each partition
  • 15. 3 August 2022 © 2022 Atomic Wire Technology Limited Where do we go from here? Processing sequenced streams 15 a b a b a 11 14 9 4 0 a : b : b a b a b 12 7 8 2 1 0 , 2 , 3 , 6 , 7 , 9 , 11 , 13 1 , 4 , 5 , 8 , 10 , 12 , 14 a b a b a 13 10 6 5 3 Processing held back to wait for out-of-order data n : sequence ID a : financial instrument Order of processing within each key matches sequence ID order
  • 16. 3 August 2022 © 2022 Atomic Wire Technology Limited Call for input What do you think? How have you solved this? 16 A Is it possible to build in-order processing guarantees with Flink? B Or is the notion of time too ingrained to accomplish this without cutting all the way to the core engine? We’d like to hear your views on this and learn about your solutions or workaround to this problem.
  • 17. We’re Hiring! Stream Processing for Financial Services ATOMICWIRE.IO/CONTACT