SlideShare a Scribd company logo
1 of 44
Download to read offline
Buckle up!
Field notes for transitioning your daily batch jobs
into realtime architecture.
Current 2022
Valerie Burchby (Game Data Engineering)
Xinran Waibel (Personalization Data Engineering)
NETFLIX
CURRENT 2022
About us
Valerie
● Senior Data Engineer @ Netflix
● Domain: Games Data Engineering
Xinran
● Senior Data Engineer @ Netflix
● Domain: Personalization Data Engineering
NETFLIX
CURRENT 2022
Batch to streaming…
more disruptive than any previous migration.
NETFLIX
CURRENT 2022
Data at rest vs data in motion
Rethinking data movement for your
● ETLs
● Infrastructure
● Team
An analytics team and platform built
for batch will need to make significant
investments to interoperate smoothly
in a streaming environment.
NETFLIX
CURRENT 2022
Agenda
❖ Episode 1: Rethinking Data Flow
❖ Episode 2: Data Quality
❖ Episode 3: Output Optimization
❖ Episode 4: Backfill
❖ Season Finale: TL;DR
NETFLIX
CURRENT 2022
Episode: 1
Rethinking
data flow
📦 Data flow in batch processing
Here’s a data flow which probably looks familiar…
NETFLIX
CURRENT 2022
Ingress
Data Store Batch App
Partitions
Batch App
Data Store
Partitions
Output
🌊 Data flow in stream processing
How hard could this be?
NETFLIX
CURRENT 2022
Output
Batch App
Data Store
Streaming App
Output
Ingress
📦 Data flow in batch processing
Closer look…
Assumptions:
Data completeness can be reasonably inferred by job progression.
Jobs are idempotent, can be safely and deterministically rerun.
NETFLIX
CURRENT 2022
Ingress
Data Store Batch App
Partitions
Batch App
Data Store
Partitions
Output
📦 Batch Data Transfer
NETFLIX
CURRENT 2022
21
22
23
00
Source Table
partitioned by event hour,
sales region
✅
✅
✅
⛔
✅
✅
⛔
⛔
Target Table
partitioned by event hour,
sales region
21
22
23
00
Deterministic batch
size and cadence
👀 Dependencies
ready?
👀 Audits passing?
NETFLIX
CURRENT 2022
Losing the “freebies” of batch processing…
🌊 Stream Data Transfer
Streaming = microbatches
of individual records
😵 Job state is fluid,
non-deterministic
😵 lowest latency is least complete
NETFLIX
CURRENT 2022
21
22
23
00
Source
Kafka topic
Target
partition: event hour
event time: 21 22 23 00
🌊 Data flow in stream processing
Common complexity: job signaling
Problems
This streaming app could go down. How will your batch job know?
Cannot hook into batch environment audit frameworks.
NETFLIX
CURRENT 2022
Output
Batch App
Data Store
Streaming App
Output
Ingress
📢📢📢
🌊 Data flow in stream processing
Common complexity: creating logical boundaries for stateful apps
Problems
Events must be held in state while waiting for group members.
Longer hold is more complete, but with a latency penalty.
NETFLIX
CURRENT 2022
Output
Batch App
Data Store
Streaming App
Output
Ingress
🗡🗡🗡
🌊 Data flow in stream processing
Common complexity: joining data
Problems
Delay in the streaming app may cause joins to enrich with incorrect data.
NETFLIX
CURRENT 2022
Output
Batch App
Data Store
Streaming App
Output
Ingress
Streaming App
⏰⏰⏰
🌊 Data flow in stream processing
Common complexity: duplicates
Problems
Restarts and job instability will cause duplicates in the data store.
NETFLIX
CURRENT 2022
Output
Batch App
Data Store
Streaming App
Output
Ingress
♻♻♻
🌊 Data flow in stream processing
And a more complex one…
To learn more details: Beyond Daily Batch Processing
Batch App Output
Data Store
Streaming App
Output
Ingress
Data Store
Streaming App
Batch App
Output
NETFLIX
CURRENT 2022
!
!
♻♻♻ ⏰⏰⏰
❓❓❓
❓❓❓
❓❓❓
NETFLIX
CURRENT 2022
NETFLIX
CURRENT 2022
Assumptions for batch 📦
● Source data is static,
materialized, stored in logical
groupings
● ETLs can operate on a fixed
increment (daily, hourly)
● Processing delays do not
materially impact data outcomes
NETFLIX
CURRENT 2022
Assumptions for streaming 🌊
Assumptions for batch 📦
● Source data is static,
materialized, stored in logical
groupings
● ETLs can operate on a fixed
increment (daily, hourly)
● Processing delays do not
materially impact data outcomes
● Source data is ephemeral, risks
becoming lossy
● Streaming applications are
constantly consuming and
producing data, require
attention to “up” time
● Delays to processing can cause
non-deterministic results
NETFLIX
CURRENT 2022
Episode: 2
Data
Quality
NETFLIX
CURRENT 2022
Batch Data Quality 📦
Write-Audit-Publish (WAP)
● Write batch job output to a
temporary location
● Querying and comparing output
with historical data
● Publish data if audits passed
NETFLIX
CURRENT 2022
Stream Data Quality 🌊
Batch Data Quality 📦
Write-Audit-Publish (WAP)
● Write batch job output to a
temporary location
● Querying and comparing output
with historical data
● Publish data if audits passed
Alerting on real-time metrics:
● Job health:
○ Restarts
○ Checkpointing
○ Consumer lag
○ Watermark
● Custom app metrics:
○ Filtered in v.s. out
○ Parsing failures
○ Invalid field values
○ Output volume
NETFLIX
CURRENT 2022
Stream data anomaly
How to cure unhealthy jobs…
💊 (Auto) manual redeployment → Solve 80% of problems
💊 Cluster resource tuning: e.g. out of memory or disk space.
💊 The worst case requires hotfix
NETFLIX
CURRENT 2022
Stream data anomaly
How to cure unhealthy jobs…
💊 (Auto) manual redeployment → Solve 80% of problems
💊 Cluster resource tuning: e.g. out of memory or disk space.
💊 The worst case requires hotfix
How to handle data inaccuracy:
🛠 Avoid full job failure and log info for debugging
🛠 For problematic events:
○ Skip, evict, or reprocess
○ Tip: Consider the nature of the data and use cases.
NETFLIX
CURRENT 2022
Episode: 3
Output
NETFLIX
CURRENT 2022
Optimizing Tables 📦
● File format: columnar formats
(incl. Parquet, ORC)
● Partition columns:
○ Consumer query patterns
○ Data size
○ Key distribution
○ Often by event time
NETFLIX
CURRENT 2022
Optimizing Streams 🌊
Optimizing Tables 📦
● File format: columnar formats
(incl. Parquet, ORC)
● Partition columns:
○ Consumer query patterns
○ Data size
○ Key distribution
○ Often by event time
● File format: row-based formats
(incl. AVRO)
● Partition strategy:
○ Ordering guarantee
○ Throughput
○ Key distribution
○ Random partitioning is go-to
● Online and real-time data stores
NETFLIX
CURRENT 2022
Reminder…
Streaming systems have batch consumers too.
NETFLIX
CURRENT 2022
Batch consumers of streaming systems
Real-time
Source
Streaming App
Output
Kafka Topic
Output
Data Lake
😰 But batch output
is often partitioned
by processing time.
Streaming
Consumers
Batch
Consumers
⚠
✅
NETFLIX
CURRENT 2022
Batch consumers of streaming systems
Real-time
Source
Streaming App
Output
Kafka Topic
Output
Data Lake
Streaming
Consumers
Batch
Consumers
⚠
✅
In order to optimize read performance for batch consumers, we need to adopt
meaningful partition strategy for batch output.
NETFLIX
CURRENT 2022
Event-time partitioning for batch output
Challenge: Late arriving data are small but from numerous event-time partitions,
leading to many small files and large amount of memory held for file writing.
Solution: Add batching mechanism that holds late records and only flushes them
out periodically, therefore writing bigger files to older partitions (without
compromising latency of recent events).
To learn more details: Streaming Event-Time Partitioning With Flink and Iceberg
NETFLIX
CURRENT 2022
Episode: 4
Backfill
NETFLIX
CURRENT 2022
Why backfill?
Data applications can fail and produce incorrect
output for many reasons:
● Unexpected input data changes
● Dependent system outage
● Source/sink failures
After failures, we need to backfill to mitigate
downstream impact.
Plus, backfilling is often required when building new
datasets or bootstrapping streaming apps.
NETFLIX
CURRENT 2022
NETFLIX
CURRENT 2022
Backfilling is a no brainer for batch job.
… but what about streaming?
The easiest way to backfill is by re-running the streaming job to reprocess source
events from the problematic period.
Problems
😭 Troubleshooting can take hours or days and source data can expire.
😭 Increasing Kafka retention is very expensive.
• Tiered storage could help reduce cost
NETFLIX
CURRENT 2022
Option #1: Replaying source events
Real-time
Source
Output
Streaming App
Build a batch application (e.g. Spark job) that is equivalent to the streaming
application but reads from data lake.
Problems
😵 Maintaining 2 applications in parallel demand significant engineering efforts.
Option #2: Lambda Architecture
NETFLIX
CURRENT 2022
Real-time
Source
Data Lake
Source
Batch App
(Backfill)
Output
Streaming App
(Prod)
Option #3: Unified batch and streaming
NETFLIX
CURRENT 2022
Real-time
Source
Data Lake
Source
Unified App
Streaming
Mode
Batch
Mode
Output
Data processing frameworks, such as Apache Flink and Beam offers both batch
and streaming modes.
Problems
😭 Flink requires significant code changes to run batch mode.
😭 Beam only has partial support on state, timers, and watermark.
NETFLIX
CURRENT 2022
Option #4: Kappa Architecture (Netflix’s Choice)
Real-time
Source
Data Lake
Source
Streaming App
Prod
Stack
Backfill
Stack
Output
Kafka Topic
Output
Data Lake
The same streaming application streams from Kafka sources for production but
reads from data lake for backfill.
To learn more details: Backfill Streaming Data Pipelines in Kappa Architecture
NETFLIX
CURRENT 2022
Season Finale
TL;DR
NETFLIX
CURRENT 2022
Batch → Streaming: TL;DR
(aka. Oops I fell asleep…)
💬 Rethink data and processing
💬 Lost gifts from batch
💬 Completeness v.s. latency
💬 Be ready for failures and recovery
💬 Ops burden is high (tooling is new)
NETFLIX
CURRENT 2022
Batch
+
Streaming
=
Better Together
Thank You.
Firstname
Lastname
flastname@netfl
ix.com
Contacts
Valerie Burchby (LinkedIn)
Xinran Waibel (Linkedin)

More Related Content

Similar to Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Databricks
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"Fwdays
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTim Ysewyn
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the CloudAmihay Zer-Kavod
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 
Denver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationDenver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationKyle Hailey
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid WorldTanel Poder
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalizationShriya Arora
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDATAVERSITY
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsVMware Tanzu
 
Revealing the Power of Legacy Machine Data
Revealing the Power of Legacy Machine DataRevealing the Power of Legacy Machine Data
Revealing the Power of Legacy Machine DataDatabricks
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 

Similar to Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022 (20)

Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"Peter Zaitsev "18 ways to fix MySQL bottlenecks"
Peter Zaitsev "18 ways to fix MySQL bottlenecks"
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 
Denver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualizationDenver devops : enabling DevOps with data virtualization
Denver devops : enabling DevOps with data virtualization
 
SQL in the Hybrid World
SQL in the Hybrid WorldSQL in the Hybrid World
SQL in the Hybrid World
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Streaming datasets for personalization
Streaming datasets for personalizationStreaming datasets for personalization
Streaming datasets for personalization
 
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled ArchitectureDM Radio Webinar: Adopting a Streaming-Enabled Architecture
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
 
Background processing with hangfire
Background processing with hangfireBackground processing with hangfire
Background processing with hangfire
 
Cloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive ApplicationsCloud-Native Patterns for Data-Intensive Applications
Cloud-Native Patterns for Data-Intensive Applications
 
Revealing the Power of Legacy Machine Data
Revealing the Power of Legacy Machine DataRevealing the Power of Legacy Machine Data
Revealing the Power of Legacy Machine Data
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022

  • 1. Buckle up! Field notes for transitioning your daily batch jobs into realtime architecture. Current 2022 Valerie Burchby (Game Data Engineering) Xinran Waibel (Personalization Data Engineering)
  • 2. NETFLIX CURRENT 2022 About us Valerie ● Senior Data Engineer @ Netflix ● Domain: Games Data Engineering Xinran ● Senior Data Engineer @ Netflix ● Domain: Personalization Data Engineering
  • 3. NETFLIX CURRENT 2022 Batch to streaming… more disruptive than any previous migration.
  • 4. NETFLIX CURRENT 2022 Data at rest vs data in motion Rethinking data movement for your ● ETLs ● Infrastructure ● Team An analytics team and platform built for batch will need to make significant investments to interoperate smoothly in a streaming environment.
  • 5. NETFLIX CURRENT 2022 Agenda ❖ Episode 1: Rethinking Data Flow ❖ Episode 2: Data Quality ❖ Episode 3: Output Optimization ❖ Episode 4: Backfill ❖ Season Finale: TL;DR
  • 7. 📦 Data flow in batch processing Here’s a data flow which probably looks familiar… NETFLIX CURRENT 2022 Ingress Data Store Batch App Partitions Batch App Data Store Partitions Output
  • 8. 🌊 Data flow in stream processing How hard could this be? NETFLIX CURRENT 2022 Output Batch App Data Store Streaming App Output Ingress
  • 9. 📦 Data flow in batch processing Closer look… Assumptions: Data completeness can be reasonably inferred by job progression. Jobs are idempotent, can be safely and deterministically rerun. NETFLIX CURRENT 2022 Ingress Data Store Batch App Partitions Batch App Data Store Partitions Output
  • 10. 📦 Batch Data Transfer NETFLIX CURRENT 2022 21 22 23 00 Source Table partitioned by event hour, sales region ✅ ✅ ✅ ⛔ ✅ ✅ ⛔ ⛔ Target Table partitioned by event hour, sales region 21 22 23 00 Deterministic batch size and cadence 👀 Dependencies ready? 👀 Audits passing?
  • 11. NETFLIX CURRENT 2022 Losing the “freebies” of batch processing…
  • 12. 🌊 Stream Data Transfer Streaming = microbatches of individual records 😵 Job state is fluid, non-deterministic 😵 lowest latency is least complete NETFLIX CURRENT 2022 21 22 23 00 Source Kafka topic Target partition: event hour event time: 21 22 23 00
  • 13. 🌊 Data flow in stream processing Common complexity: job signaling Problems This streaming app could go down. How will your batch job know? Cannot hook into batch environment audit frameworks. NETFLIX CURRENT 2022 Output Batch App Data Store Streaming App Output Ingress 📢📢📢
  • 14. 🌊 Data flow in stream processing Common complexity: creating logical boundaries for stateful apps Problems Events must be held in state while waiting for group members. Longer hold is more complete, but with a latency penalty. NETFLIX CURRENT 2022 Output Batch App Data Store Streaming App Output Ingress 🗡🗡🗡
  • 15. 🌊 Data flow in stream processing Common complexity: joining data Problems Delay in the streaming app may cause joins to enrich with incorrect data. NETFLIX CURRENT 2022 Output Batch App Data Store Streaming App Output Ingress Streaming App ⏰⏰⏰
  • 16. 🌊 Data flow in stream processing Common complexity: duplicates Problems Restarts and job instability will cause duplicates in the data store. NETFLIX CURRENT 2022 Output Batch App Data Store Streaming App Output Ingress ♻♻♻
  • 17. 🌊 Data flow in stream processing And a more complex one… To learn more details: Beyond Daily Batch Processing Batch App Output Data Store Streaming App Output Ingress Data Store Streaming App Batch App Output NETFLIX CURRENT 2022 ! ! ♻♻♻ ⏰⏰⏰ ❓❓❓ ❓❓❓ ❓❓❓
  • 19. NETFLIX CURRENT 2022 Assumptions for batch 📦 ● Source data is static, materialized, stored in logical groupings ● ETLs can operate on a fixed increment (daily, hourly) ● Processing delays do not materially impact data outcomes
  • 20. NETFLIX CURRENT 2022 Assumptions for streaming 🌊 Assumptions for batch 📦 ● Source data is static, materialized, stored in logical groupings ● ETLs can operate on a fixed increment (daily, hourly) ● Processing delays do not materially impact data outcomes ● Source data is ephemeral, risks becoming lossy ● Streaming applications are constantly consuming and producing data, require attention to “up” time ● Delays to processing can cause non-deterministic results
  • 22. NETFLIX CURRENT 2022 Batch Data Quality 📦 Write-Audit-Publish (WAP) ● Write batch job output to a temporary location ● Querying and comparing output with historical data ● Publish data if audits passed
  • 23. NETFLIX CURRENT 2022 Stream Data Quality 🌊 Batch Data Quality 📦 Write-Audit-Publish (WAP) ● Write batch job output to a temporary location ● Querying and comparing output with historical data ● Publish data if audits passed Alerting on real-time metrics: ● Job health: ○ Restarts ○ Checkpointing ○ Consumer lag ○ Watermark ● Custom app metrics: ○ Filtered in v.s. out ○ Parsing failures ○ Invalid field values ○ Output volume
  • 24. NETFLIX CURRENT 2022 Stream data anomaly How to cure unhealthy jobs… 💊 (Auto) manual redeployment → Solve 80% of problems 💊 Cluster resource tuning: e.g. out of memory or disk space. 💊 The worst case requires hotfix
  • 25. NETFLIX CURRENT 2022 Stream data anomaly How to cure unhealthy jobs… 💊 (Auto) manual redeployment → Solve 80% of problems 💊 Cluster resource tuning: e.g. out of memory or disk space. 💊 The worst case requires hotfix How to handle data inaccuracy: 🛠 Avoid full job failure and log info for debugging 🛠 For problematic events: ○ Skip, evict, or reprocess ○ Tip: Consider the nature of the data and use cases.
  • 27. NETFLIX CURRENT 2022 Optimizing Tables 📦 ● File format: columnar formats (incl. Parquet, ORC) ● Partition columns: ○ Consumer query patterns ○ Data size ○ Key distribution ○ Often by event time
  • 28. NETFLIX CURRENT 2022 Optimizing Streams 🌊 Optimizing Tables 📦 ● File format: columnar formats (incl. Parquet, ORC) ● Partition columns: ○ Consumer query patterns ○ Data size ○ Key distribution ○ Often by event time ● File format: row-based formats (incl. AVRO) ● Partition strategy: ○ Ordering guarantee ○ Throughput ○ Key distribution ○ Random partitioning is go-to ● Online and real-time data stores
  • 30. NETFLIX CURRENT 2022 Batch consumers of streaming systems Real-time Source Streaming App Output Kafka Topic Output Data Lake 😰 But batch output is often partitioned by processing time. Streaming Consumers Batch Consumers ⚠ ✅
  • 31. NETFLIX CURRENT 2022 Batch consumers of streaming systems Real-time Source Streaming App Output Kafka Topic Output Data Lake Streaming Consumers Batch Consumers ⚠ ✅ In order to optimize read performance for batch consumers, we need to adopt meaningful partition strategy for batch output.
  • 32. NETFLIX CURRENT 2022 Event-time partitioning for batch output Challenge: Late arriving data are small but from numerous event-time partitions, leading to many small files and large amount of memory held for file writing. Solution: Add batching mechanism that holds late records and only flushes them out periodically, therefore writing bigger files to older partitions (without compromising latency of recent events). To learn more details: Streaming Event-Time Partitioning With Flink and Iceberg
  • 34. NETFLIX CURRENT 2022 Why backfill? Data applications can fail and produce incorrect output for many reasons: ● Unexpected input data changes ● Dependent system outage ● Source/sink failures After failures, we need to backfill to mitigate downstream impact. Plus, backfilling is often required when building new datasets or bootstrapping streaming apps.
  • 36. NETFLIX CURRENT 2022 Backfilling is a no brainer for batch job. … but what about streaming?
  • 37. The easiest way to backfill is by re-running the streaming job to reprocess source events from the problematic period. Problems 😭 Troubleshooting can take hours or days and source data can expire. 😭 Increasing Kafka retention is very expensive. • Tiered storage could help reduce cost NETFLIX CURRENT 2022 Option #1: Replaying source events Real-time Source Output Streaming App
  • 38. Build a batch application (e.g. Spark job) that is equivalent to the streaming application but reads from data lake. Problems 😵 Maintaining 2 applications in parallel demand significant engineering efforts. Option #2: Lambda Architecture NETFLIX CURRENT 2022 Real-time Source Data Lake Source Batch App (Backfill) Output Streaming App (Prod)
  • 39. Option #3: Unified batch and streaming NETFLIX CURRENT 2022 Real-time Source Data Lake Source Unified App Streaming Mode Batch Mode Output Data processing frameworks, such as Apache Flink and Beam offers both batch and streaming modes. Problems 😭 Flink requires significant code changes to run batch mode. 😭 Beam only has partial support on state, timers, and watermark.
  • 40. NETFLIX CURRENT 2022 Option #4: Kappa Architecture (Netflix’s Choice) Real-time Source Data Lake Source Streaming App Prod Stack Backfill Stack Output Kafka Topic Output Data Lake The same streaming application streams from Kafka sources for production but reads from data lake for backfill. To learn more details: Backfill Streaming Data Pipelines in Kappa Architecture
  • 42. NETFLIX CURRENT 2022 Batch → Streaming: TL;DR (aka. Oops I fell asleep…) 💬 Rethink data and processing 💬 Lost gifts from batch 💬 Completeness v.s. latency 💬 Be ready for failures and recovery 💬 Ops burden is high (tooling is new)