SlideShare a Scribd company logo
Exactly-Once Financial Data Processing at
Scale with Flink and Pinot
Speakers
2
Xiang Zhang
Stripe
Pratyush Sharma
Stripe
Xiaoman Dong
StarTree
Agenda
Problem: near real-time end-to-end exactly once processing pipeline at scale
The architecture: Kafka, Flink, Pinot and how to connect all together
Operational challenges and learnings
3
1
2
3
The problem to solve—Ledger dataset
Ledger is a data set that Stripe
maintains to record all money
movements
4
Requirements for the Ledger pipeline
Near real-time processing to meet SLO targets (p99 in orders of minutes; p90 < 1 minute)
5
1
Requirements for the Ledger pipeline
Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute)
Be able to process events at scale
6
1
2
Requirements for the Ledger pipeline
Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute)
Be able to process events at scale
No missing transactions: a single transaction can be of millions of dollars
7
1
2
3
Requirements for the Ledger pipeline
Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute)
Be able to process events at scale
No missing transactions: a single transaction can be of millions of dollars
No duplicate transactions across the entire history:
● Duplicates are inevitable on the source sides (deployments, restarts, accidental
duplicate job executions etc.)
8
1
2
3
4
Requirements for the Ledger pipeline
Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute)
Be able to process events at scale
No missing transactions: a single transaction can be of millions of dollars
No duplicate transactions across the entire history
9
1
2
3
4
Near real-time end-to-end exactly-once processing at scale!
Agenda
Problem: near real-time end-to-end exactly once processing pipeline at scale
The architecture: Kafka, Flink, Pinot and how to connect all together
Operational challenges and learnings
10
1
2
3
High-Level Pipeline
11
High-Level Pipeline
12
The Deduplicator
13
In reality, we store transactions IDs in Flink state for deduplication
Flink End-to-End Exactly Once Processing - Flink Deduplicator (1/3)
14
Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
Flink End-to-End Exactly Once Processing - Flink Deduplicator (2/3)
15
Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
Flink End-to-End Exactly Once Processing - Flink Deduplicator (3/3)
16
Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
High-Level Pipeline
17
Pinot Exactly Once Ingestion (1/5)
18
Pinot Exactly Once Ingestion (2/5)
19
● Pinot table rows are stored in
immutable chunks/batches called
segments
● Real time segments being indexed are
mutable. Once they are full they will be
“sealed” and become immutable. New
mutable segments will be created to
continue indexing.
Pinot Exactly Once Ingestion (3/5)
20
We can consider Pinot’s latest segment as one
database transaction:
● Transaction begins at segment creation
● Transaction is committed when “sealed”
● Kafka offset stored atomically along with Pinot
segment metadata
● If any exception happens, the whole
transaction (segment) is aborted and restarted
Pinot Exactly Once Ingestion (4/5)
21
{
"segment.crc": "3251475672",
"segment.creation.time": "1648231912328",
"segment.download.url":
"s3://some/table/mytable__8__0__20220325T1811Z",
"segment.end.time": "1388707200000",
"segment.flush.threshold.size": "4166",
"segment.index.version": "v3",
"segment.realtime.endOffset": "10264",
"segment.realtime.numReplicas": "2",
"segment.realtime.startOffset": "10240",
"segment.realtime.status": "DONE",
"segment.start.time": "1388707200000",
"segment.time.unit": "MILLISECONDS",
"segment.total.docs": "24"
}
● Each segment has one single Zookeeper
node storing its metadata
● Kafka Offsets are stored inside segment
metadata
● Atomicity
○ Zookeeper node update is atomic
○ Kafka offset is updated at the same
time segment status updates
(“DONE”)
Pinot Exactly Once Ingestion (5/5)
22
If Pinot server is restarted or crashed
● Whole segment is discarded
● Segment recreated starting from the
next position of offset from Segment_0
● Kafka consumer seek() is called
Agenda
Problem: near real-time end-to-end exactly once processing pipeline at scale
The architecture: Kafka, Flink, Pinot and how to connect all together
Operational challenges and learnings
23
1
2
3
Caveats of exactly-once - nothing is free!
Exactly-once is not bulletproof. Data loss or duplicate can still happen.
24
1
Caveats of exactly-once - nothing is free!
Exactly-once is not bulletproof. Data loss or duplicate can still happen.
It might give users a false sense of security.
25
1
2
Caveats of exactly-once - nothing is free!
Exactly-once is not bulletproof. Data loss or duplicate can still happen.
It might give users a false sense of security.
Hard to add additional layers to the architecture due to transactional guarantee.
26
1
2
3
Caveats of exactly-once - nothing is free!
Exactly-once is not bulletproof. Data loss or duplicate can still happen.
It might give users a false sense of security.
Hard to add additional layers to the architecture due to transactional guarantee.
Latency and SLO is impacted by checkpoint intervals.
27
1
2
3
4
Potential data loss in two-phase commit
28
Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
Potential data loss in two-phase commit
29
Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
The transaction can be expired in Kafka!
Optimizing large state hydration at recovery time
The ledger deduplicator app maintains tens of terabytes of states to do all time
deduplication
30
1
Optimizing large state hydration at recovery time
The ledger deduplicator app maintains tens of terabytes of states to do all time deduplication
Task local recovery doesn’t work with multiple disks mounted (FLINK-10954)
● Need to hydrate the entire state everytime the job is rescheduled (job failure, host
failure/restarts/recycle)
● Impacts end to end latency
31
1
2
Optimizing large state hydration at recovery time
The ledger deduplicator app maintains tens of terabytes of states to do all time deduplication
Task local recovery doesn’t work with multiple disks mounted (FLINK-10954)
● Need to hydrate the entire state everytime the job is rescheduled (job failure, host
failure/restarts/recycle)
● Impacts end to end latency
Even if we make local recovery work, Stripe recycles hosts periodically
● The pipeline is as slow as the slowest host to recover state
32
1
2
3
Optimizing large state hydration at recovery time
● Task parallelism increase to the rescue: the more threads, the faster to download the state and
to rebuild the local state DB!
33
○ Increasing parallelism requires state
redistribution.
○ Flink uses the concept of key-group
as an atomic unit for state
distribution.
Parallelism Increase from 180 to 270 doesn’t work
34
Each task gets 2 key groups assigned
1. 180 tasks have 1 key group
2. 90 tasks have 2 key groups
What we want is even distribution
35
Each task gets 2 key groups assigned Each task gets 1 key group assigned
Monitoring Large State Size
● Flink can report native RocksDB metrics.
● State backend latency tracking metrics can help debugging.
● Large pending Rocks DB compactions can affect performance.
36
Linux OOM Kills Causing Job Restarts
● Flink < 1.12 uses glibc to allocate memory, which leads to memory fragmentation.
● Combined with large states required by the deduplicator app, it consistently causes OOM.
● With large number of task managers and time it takes to rehydrate state, it impacts latency SLO.
37
jemalloc Everywhere
● Flink switched to jemalloc for its default memory allocator in Docker images in Flink 1.12.
38
Pre jemalloc Post jemalloc
Data Quality Monitoring
● Pinot is an analytics platform that runs SQL blazingly fast, so…
○ Duplicate detection:
■ SELECT primary_key, count(*) as cnt FROM mytable
GROUP BY primary_key HAVING cnt > 1
■ Run query in REALTIME only to help query performance by using special table name
like mytable_REALTIME
○ Missing entry detection:
■ Bucket rows by time and count by bucket
■ JOIN/Compare to source of truth (upstream metric in Data Warehouse)
39
How to repair data in Pinot?
● If some range of data are corrupted (contains duplicate)
○ Find the duplicated data by SQL query.
○ Delete and rebuild the Pinot segments containing duplicates.
○ Pinot virtual column names like $segmentName helps locating segments.
● Best Practices
○ A reliable exactly-once Kafka Archive (backup) will come in handy in a fire.
○ Build stable/reliable timestamp into primary key, use that timestamp as Pinot timestamp.
40
Lessons Learned
● Flink
○ Set a Kafka transaction timeout large enough to account for any job downtime.
○ Set a parallelism to a number such that max parallelism is divisible by this number.
○ Use jemalloc in Flink.
● Pinot
○ Higher Kafka transaction frequency and shorter Flink checkpoint intervals will improve end
to end data freshness in Pinot.
○ Beware of bogus message counts: Many Kafka internal metrics include messages of failed
transactions.
○ Duplicate monitoring is a must for critical apps.
41

More Related Content

What's hot

Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
Aljoscha Krettek
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
Konstantin Knauf
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
confluent
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
confluent
 

What's hot (20)

Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Apache Flink Worst Practices
Apache Flink Worst PracticesApache Flink Worst Practices
Apache Flink Worst Practices
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 

Similar to Exactly-Once Financial Data Processing at Scale with Flink and Pinot

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
Monal Daxini
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
Monal Daxini
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
Steven Wu
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
HostedbyConfluent
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
Aljoscha Krettek
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Bowen Li
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
Jamie Grier
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
Maximilian Michels
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Evention
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableWhy Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
HostedbyConfluent
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
HostedbyConfluent
 

Similar to Exactly-Once Financial Data Processing at Scale with Flink and Pinot (20)

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022When Streaming Needs Batch With Konstantin Knauf | Current 2022
When Streaming Needs Batch With Konstantin Knauf | Current 2022
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data ArtisansApache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data ArtisansStream processing with Apache Flink - Maximilian Michels Data Artisans
Stream processing with Apache Flink - Maximilian Michels Data Artisans
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made ScalableWhy Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 

More from Flink Forward

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
Flink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 

More from Flink Forward (15)

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 

Recently uploaded

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

  • 1. Exactly-Once Financial Data Processing at Scale with Flink and Pinot
  • 3. Agenda Problem: near real-time end-to-end exactly once processing pipeline at scale The architecture: Kafka, Flink, Pinot and how to connect all together Operational challenges and learnings 3 1 2 3
  • 4. The problem to solve—Ledger dataset Ledger is a data set that Stripe maintains to record all money movements 4
  • 5. Requirements for the Ledger pipeline Near real-time processing to meet SLO targets (p99 in orders of minutes; p90 < 1 minute) 5 1
  • 6. Requirements for the Ledger pipeline Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute) Be able to process events at scale 6 1 2
  • 7. Requirements for the Ledger pipeline Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute) Be able to process events at scale No missing transactions: a single transaction can be of millions of dollars 7 1 2 3
  • 8. Requirements for the Ledger pipeline Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute) Be able to process events at scale No missing transactions: a single transaction can be of millions of dollars No duplicate transactions across the entire history: ● Duplicates are inevitable on the source sides (deployments, restarts, accidental duplicate job executions etc.) 8 1 2 3 4
  • 9. Requirements for the Ledger pipeline Near real-time processing to meet SLO targets (p99 in orders of minutes p90 < 1 minute) Be able to process events at scale No missing transactions: a single transaction can be of millions of dollars No duplicate transactions across the entire history 9 1 2 3 4 Near real-time end-to-end exactly-once processing at scale!
  • 10. Agenda Problem: near real-time end-to-end exactly once processing pipeline at scale The architecture: Kafka, Flink, Pinot and how to connect all together Operational challenges and learnings 10 1 2 3
  • 13. The Deduplicator 13 In reality, we store transactions IDs in Flink state for deduplication
  • 14. Flink End-to-End Exactly Once Processing - Flink Deduplicator (1/3) 14 Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
  • 15. Flink End-to-End Exactly Once Processing - Flink Deduplicator (2/3) 15 Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
  • 16. Flink End-to-End Exactly Once Processing - Flink Deduplicator (3/3) 16 Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
  • 18. Pinot Exactly Once Ingestion (1/5) 18
  • 19. Pinot Exactly Once Ingestion (2/5) 19 ● Pinot table rows are stored in immutable chunks/batches called segments ● Real time segments being indexed are mutable. Once they are full they will be “sealed” and become immutable. New mutable segments will be created to continue indexing.
  • 20. Pinot Exactly Once Ingestion (3/5) 20 We can consider Pinot’s latest segment as one database transaction: ● Transaction begins at segment creation ● Transaction is committed when “sealed” ● Kafka offset stored atomically along with Pinot segment metadata ● If any exception happens, the whole transaction (segment) is aborted and restarted
  • 21. Pinot Exactly Once Ingestion (4/5) 21 { "segment.crc": "3251475672", "segment.creation.time": "1648231912328", "segment.download.url": "s3://some/table/mytable__8__0__20220325T1811Z", "segment.end.time": "1388707200000", "segment.flush.threshold.size": "4166", "segment.index.version": "v3", "segment.realtime.endOffset": "10264", "segment.realtime.numReplicas": "2", "segment.realtime.startOffset": "10240", "segment.realtime.status": "DONE", "segment.start.time": "1388707200000", "segment.time.unit": "MILLISECONDS", "segment.total.docs": "24" } ● Each segment has one single Zookeeper node storing its metadata ● Kafka Offsets are stored inside segment metadata ● Atomicity ○ Zookeeper node update is atomic ○ Kafka offset is updated at the same time segment status updates (“DONE”)
  • 22. Pinot Exactly Once Ingestion (5/5) 22 If Pinot server is restarted or crashed ● Whole segment is discarded ● Segment recreated starting from the next position of offset from Segment_0 ● Kafka consumer seek() is called
  • 23. Agenda Problem: near real-time end-to-end exactly once processing pipeline at scale The architecture: Kafka, Flink, Pinot and how to connect all together Operational challenges and learnings 23 1 2 3
  • 24. Caveats of exactly-once - nothing is free! Exactly-once is not bulletproof. Data loss or duplicate can still happen. 24 1
  • 25. Caveats of exactly-once - nothing is free! Exactly-once is not bulletproof. Data loss or duplicate can still happen. It might give users a false sense of security. 25 1 2
  • 26. Caveats of exactly-once - nothing is free! Exactly-once is not bulletproof. Data loss or duplicate can still happen. It might give users a false sense of security. Hard to add additional layers to the architecture due to transactional guarantee. 26 1 2 3
  • 27. Caveats of exactly-once - nothing is free! Exactly-once is not bulletproof. Data loss or duplicate can still happen. It might give users a false sense of security. Hard to add additional layers to the architecture due to transactional guarantee. Latency and SLO is impacted by checkpoint intervals. 27 1 2 3 4
  • 28. Potential data loss in two-phase commit 28 Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
  • 29. Potential data loss in two-phase commit 29 Source: https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html The transaction can be expired in Kafka!
  • 30. Optimizing large state hydration at recovery time The ledger deduplicator app maintains tens of terabytes of states to do all time deduplication 30 1
  • 31. Optimizing large state hydration at recovery time The ledger deduplicator app maintains tens of terabytes of states to do all time deduplication Task local recovery doesn’t work with multiple disks mounted (FLINK-10954) ● Need to hydrate the entire state everytime the job is rescheduled (job failure, host failure/restarts/recycle) ● Impacts end to end latency 31 1 2
  • 32. Optimizing large state hydration at recovery time The ledger deduplicator app maintains tens of terabytes of states to do all time deduplication Task local recovery doesn’t work with multiple disks mounted (FLINK-10954) ● Need to hydrate the entire state everytime the job is rescheduled (job failure, host failure/restarts/recycle) ● Impacts end to end latency Even if we make local recovery work, Stripe recycles hosts periodically ● The pipeline is as slow as the slowest host to recover state 32 1 2 3
  • 33. Optimizing large state hydration at recovery time ● Task parallelism increase to the rescue: the more threads, the faster to download the state and to rebuild the local state DB! 33 ○ Increasing parallelism requires state redistribution. ○ Flink uses the concept of key-group as an atomic unit for state distribution.
  • 34. Parallelism Increase from 180 to 270 doesn’t work 34 Each task gets 2 key groups assigned 1. 180 tasks have 1 key group 2. 90 tasks have 2 key groups
  • 35. What we want is even distribution 35 Each task gets 2 key groups assigned Each task gets 1 key group assigned
  • 36. Monitoring Large State Size ● Flink can report native RocksDB metrics. ● State backend latency tracking metrics can help debugging. ● Large pending Rocks DB compactions can affect performance. 36
  • 37. Linux OOM Kills Causing Job Restarts ● Flink < 1.12 uses glibc to allocate memory, which leads to memory fragmentation. ● Combined with large states required by the deduplicator app, it consistently causes OOM. ● With large number of task managers and time it takes to rehydrate state, it impacts latency SLO. 37
  • 38. jemalloc Everywhere ● Flink switched to jemalloc for its default memory allocator in Docker images in Flink 1.12. 38 Pre jemalloc Post jemalloc
  • 39. Data Quality Monitoring ● Pinot is an analytics platform that runs SQL blazingly fast, so… ○ Duplicate detection: ■ SELECT primary_key, count(*) as cnt FROM mytable GROUP BY primary_key HAVING cnt > 1 ■ Run query in REALTIME only to help query performance by using special table name like mytable_REALTIME ○ Missing entry detection: ■ Bucket rows by time and count by bucket ■ JOIN/Compare to source of truth (upstream metric in Data Warehouse) 39
  • 40. How to repair data in Pinot? ● If some range of data are corrupted (contains duplicate) ○ Find the duplicated data by SQL query. ○ Delete and rebuild the Pinot segments containing duplicates. ○ Pinot virtual column names like $segmentName helps locating segments. ● Best Practices ○ A reliable exactly-once Kafka Archive (backup) will come in handy in a fire. ○ Build stable/reliable timestamp into primary key, use that timestamp as Pinot timestamp. 40
  • 41. Lessons Learned ● Flink ○ Set a Kafka transaction timeout large enough to account for any job downtime. ○ Set a parallelism to a number such that max parallelism is divisible by this number. ○ Use jemalloc in Flink. ● Pinot ○ Higher Kafka transaction frequency and shorter Flink checkpoint intervals will improve end to end data freshness in Pinot. ○ Beware of bogus message counts: Many Kafka internal metrics include messages of failed transactions. ○ Duplicate monitoring is a must for critical apps. 41

Editor's Notes

  1. Transitional slides from talking about deduplicator to pinot ingestion.
  2. Transitional slides from talking about deduplicator to pinot ingestion.
  3. Transitional slides from talking about deduplicator to pinot ingestion.
  4. Transitional slides from talking about deduplicator to pinot ingestion.
  5. Transitional slides from talking about deduplicator to pinot ingestion.