SlideShare a Scribd company logo
1 of 113
Building Stream Processing
as a Service (SPaaS)
Steven Wu @stevenzwu
Why stream processing?
Unbounded user activity stream
Alice
Bob
Jane
Unbounded data - batch
Alice
Bob
Jane
Feb 25 Feb 26
Unbounded data - batch
Alice
Bob
Jane
Feb 25 Feb 26
Session Gap
Unbounded data - batch
Alice
Bob
Jane
Feb 25 Feb 26
Unbounded data - batch
Alice
Bob
Jane
Feb 25 Feb 26
Unbounded data - stream
Alice
Bob
Jane
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Real Time Data Infrastructure
Producers Consumers
Messaging As A
Service (Kafka)
Stream Processing As
A Service (Flink)
Keystone Data Pipeline
Topics Routing Transformations
(filter, projection, UDF)
Schema /
Hygiene
Stream Processing
Producers Consumers
Messaging As A
Service (Kafka)
Stream Processing As
A Service (Flink)
Keystone Data Pipeline
Topics Routing Transformations
(filter, projection, UDF)
Schema /
Hygiene
Event
Producers
Sinks
highly available ingest pipelines - the
backbone of a real-time data infrastructure
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Exact-once semantics for stateful computation
Source: http://flink.apache.org/
Flexible windowing
Source: http://flink.apache.org/
Event time semantics
Source: http://flink.apache.org/
State backends and checkpointing
Available
● Memory
● File system
● RocksDB (support
incremental checkpoint)
Source: http://flink.apache.org/
Source: http://flink.apache.org/
Checkpoint is lightweight
Source: Stephan Ewen
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Offerings by complexity
● Simple drag and drop: filter, projection, data hygiene
○ Available now via Keystone router
● Medium: SQL, UDF (User Defined Function)
○ Coming 2018
● Advanced: custom stream processing applications
○ Available now
Ease of use v.s. capability
difficult
easy
Limited Full feature
Color legend
Available now
Coming 2018
Easeofuse
Capability
Ease of use v.s. capability
difficult
easy
Limited Full feature
Color legend
Available now
Coming 2018
Easeofuse
Capability
Keystone
Router
Ease of use v.s. capability
difficult
easy
Limited Full feature
Custom
SPaaS App
Color legend
Available now
Coming 2018
Easeofuse
Capability
Keystone
Router
Ease of use v.s. capability
difficult
easy
Limited Full feature
Keystone
Router
Custom
SPaaS App
SQL
Color legend
Available now
Coming 2018
Easeofuse
Capability
Ease of use v.s. capability
difficult
easy
Limited Full feature
Custom
SPaaS App
Color legend
Available now
Coming 2018
Easeofuse
Capability
Keystone
Router + UDF
SQL + UDF
SPaaS running on Titus
(Netflix’s in-house container runtime)
Job isolation: single job
Job
Manager
Task
Manager
Task
Manager
Task
Manager
...
Titus Job #1
Titus Job #2
Flink
standalone
cluster
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Events are published to fronting Kafka
directly or via proxy
KSGateway
Stream
Consumers
Event
Producer
Keystone
Management
Fronting
Kafka
Flink
Router
Consumer
Kafka
HTTP /
gRPC
Events land up in fronting Kafka cluster
KSGateway
Stream
Consumers
Event
Producer
Keystone
Management
Fronting
Kafka
Flink
Router
Consumer
Kafka
HTTP /
gRPC
Events are polled by router, filter and
projection applied
KSGateway
Stream
Consumers
Keystone
Management
Fronting
Kafka
Flink
Router
Consumer
Kafka
Event
Producer
HTTP /
gRPC
Router sends events to destination
KSGateway
Stream
Consumers
Keystone Management
Fronting
Kafka
Flink
Router
Consumer
Kafka
Event
Producer
HTTP /
gRPC
Keystone pipeline system boundary
Flink
RouterFronting
Kafka
Event
Producer
Keystone
Management
KSGateway
Consumer
Kafka
Stream
Consumers
HTTP /
gRPC
Event
Producers
Sinks
highly available ingest pipelines
Keystone scale
● >1,000,000,000 unique events
ingested per day
● >99.9999% of delivery rate
Demo: provision a data stream
(mini pipeline)
Configure
outputs
Drag-and-drop Keystone router
● Stateless and embarrassingly parallel
● ~2,000 jobs in prod
● Self serve and fully managed
● At least once delivery semantics
● Isolation
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Out-Of-The-Box Functionality
● Templates (Java /
Scala)
● Build and Deployment
tooling
● Connectors
● Dashboards
● Logs
● Alerts
● Titus Integration
● Capacity
Management
Demo: SPaaS project bootstrap
Skeleton code
createSource("example-kafka-source")
.addSink(getSink("null-sink")).name(“null-sink”);
Add business logic
createSource("example-kafka-source")
.keyBy(<key selector>)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.reduce(<window function>);
.addSink(getSink(“hive-sink")).name(“hive-sink”);
Demo: create a new Flink job
Override source config
Override Kafka cluster VIP
kafka-test:2181 kafka-prod:2181
Override job config
Configure resources
Configure multiple sources or sinks
kafka-test:2181 kafka-prod:2181
Deep links
Duplo blocks
● Filter
● Projector
● Data Hygiene
● Connectors
Sources
● Kafka
● Hive
Supported Source and Sink Connectors
Sinks
● Elasticsearch
● Kafka
● Hive
● Keystone
Agenda
● Introduction
● Apache Flink primer
● SPaaS Overview
● Keystone Router
● Custom Stream Processing Applications
● Backfill and Rewind
Things can go wrong
Application bug
Flink
Streaming
Job
SinkSource
Sink failure
Flink
Streaming
Job
SinkSource
Dependency service failure
Flink
Streaming
Job
Sink
Micro
Service
Source
Data Enrichment
How to recover
● Backfill (available now)
● Rewind Flink job (coming soon)
How to recover
● Backfill
● Rewind Flink job
Live job continues
Live Job
SinkKafka
Time
Now ->outage period
Hive as backfill source
Live Job
Sink
Backfill Job
Kafka
TimeNow ->outage period
Choose Hive source
Configure Hive source
Not a lambda architecture
● Single streaming code base
● Just switch source from Kafka to Hive
Hive backfill probably not for stateful jobs
● Warm-up issue
● Ordering issue
Hive backfill probably not for stateful jobs
● Warm-up issue
● Ordering issue
Stateful stream processor
Image adapted from Stephen Ewen
Warm-up period
SinkBackfill Job
Timeoutage periodWarm-up period
No output emit during warm-up
SinkBackfill Job
Timeoutage periodWarm-up period
Emit outputNo output
Hive backfill probably not for stateful jobs
● Warm-up issue
● Ordering issue
Kafka: messages ordered within a partition
Source: kafka.apache.org
Hadoop input split
f0 f1 f2 f3 f4files
Hadoop input split
f0 f1 f2 f3 f4
s0 s1 s2 s3 s4 s5 s6 s7 s8 s9
files
splits
Hadoop input split
f0 f1 f2 f3 f4
s0 s1 s2 s3 s4 s5 s6 s7 s8 s9
files
splits
Job
Manager
Task
Manager
Task
Manager
Task
Manager
s0 … s9
Split
calculation
Hadoop input split
f0 f1 f2 f3 f4
s0 s1 s2 s3 s4 s5 s6 s7 s8 s9
files
splits
Job
Manager
Task
Manager
Task
Manager
Task
Manager
s0 … s9
s0
s1
s2
s3
s4
s5
s6
s7
s8
s9
Split
calculation
Split
Assignment
Where is the order?
f0 (hour=0) f1 f2 (hour=23) f3 (hour=12) f4 (hour=3)
s0 s1 s2 s3 s4 s5 s6 s7 s8 s9
files
splits
Task
Manager
s0 (hour=0)
...
s3 (hour=23) s6 (hour=12) s9 (hour=3)
...
Does time/ordering matters?
● Probably not for stateless computation
● Probably important for stateful
computation
Window with allowed lateness
DataStream<T> input = ...;
input
.keyBy(<key selector>)
.window(<window assigner>)
.allowedLateness(<time>)
.<windowed transformation>(<window function>);
Source: flink.apache.org
How to recover
● Backfill
● Rewind Flink job
Flink checkpoint and fault tolerance
Time
Checkpoint x-1 Checkpoint x Now
Flink checkpoint and fault tolerance
Time
Checkpoint x-1 Checkpoint x
Flink rewind
Time
Checkpoint y Checkpoint x
outage window
Checkpoint x+1
Now
Flink rewind
Time
Checkpoint y Checkpoint x
outage window
Checkpoint x+1
Kafka retention
Time
outage window NowKafka retention
As far as we can
go back
Hive backfill v.s. Flink rewind
Hive backfill Flink rewind
Warm-up issue Yes No
Ordering issue Yes No
Data retention Months Hours or days
Applicability Stateless Stateless and
stateful
Pros for Hive backfill source
● Long-term storage (a few months)
● Fast recovery
○ S3 is very scalable
○ Runs in parallel with live job
Stateless Stateful
Hive backfill Flink rewind
Today’s recommendation
Stateless Stateful
Hive backfill Flink
rewind
Is this the future?
Stateless Stateful
Flink rewind
Or is this the future?
Caveats for reprocessing
● Does not overwhelm external services
● Non-retractable sink output
● Non-replayable dependencies
Does not overwhelm external services
Flink
Streaming
Job
Sink
Micro
Service
Source
10x load
10x load
● Duplicates are ok
● Idempotent sink
● Cleanable sink
○ e.g. drop Hive partition with bad data
Non-retractable sink output
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
Process
live msg X
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
Process
live msg X
Alice?
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
Process
live msg X
Alice? Cell A
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
Allocation
change
Process
live msg X
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
outage period
Allocation
change
Process
live msg X
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
outage period
RewindAllocation
change
Process
live msg X
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
outage period
RewindAllocation
change
Process
live msg X Reprocess
old msg X
Non-replayable dependencies
Flink
Streaming
Job
Sink
A/B
Service
Source
Time
outage period
RewindAllocation
change
Process
live msg X Reprocess
old msg X
Alice? Cell B
Convert table to stream
Flink
Streaming
Job
Sink
A/B
Service
Source
Table lookup
Convert table to stream
Flink
Streaming
Job
Sink
A/B
Source
Source
State
A/B data becomes
part of app state
Stream Kong
Putting together
Amazon EC2
Titus Container Runtime
Stream Processing Platform
(Flink Streaming Engine, Config Management)
Reusable Components
Source & Sink Connectors, Filtering, Projection, etc.
Keystone Routers
(with UDF)
Streaming JobsSQL
(with UDF)
SPaaS Layered Cake
Geico caveman, https://memegenerator.net
Thank you!
@stevenzwu

More Related Content

What's hot

Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
confluent
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 

What's hot (20)

Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
Can Kafka Handle a Lyft Ride? (Andrey Falko & Can Cecen, Lyft) Kafka Summit 2020
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails? Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
Kafka Summit NYC 2017 - Apache Kafka in the Enterprise: What if it Fails?
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
 
Jitney, Kafka at Airbnb
Jitney, Kafka at AirbnbJitney, Kafka at Airbnb
Jitney, Kafka at Airbnb
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 

Similar to Building Stream Processing as a Service

Similar to Building Stream Processing as a Service (20)

Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
Netflix Keystone SPaaS: Real-time Stream Processing as a Service - ABD320 - r...
 
Patterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxiniPatterns of-streaming-applications-qcon-2018-monal-daxini
Patterns of-streaming-applications-qcon-2018-monal-daxini
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018Kafka elastic search meetup 09242018
Kafka elastic search meetup 09242018
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 

Recently uploaded

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 

Building Stream Processing as a Service

Editor's Notes

  1. Who have used any stream processing framework? What about Apache Flink?
  2. We care about stream processing at Netflix, because we have real business needs for it.
  3. If you go to Netflix home screen, it shows a list of rows of recommended TV shows and movies, personalized to your taste. Most of those recommendations are done by offline batch processing jobs. There is one row on the home screen called “Trending Now” that shows the videos that are trending in Netflix right now. It is also personalized. “Trending Now” is computed by a stream processing application in real time. Here now really means now, not a hour ago, not yesterday.
  4. In some cases, streaming is just a more natural way to process unbounded data compared to batch. In this simple example, we have a unbounded user activity stream and we want to do session. Session is defined when a gap of inactivity occurred.
  5. In batch world, we split the input data into fixed-sized windows (e.g. daily window) of bounded data set. then we repeatedly process each of those fixed-size window.
  6. Here are how batch job would compute the sessions.
  7. Here is the problem for Bob. One of his sessions cross the boundary of day. Since they are processed at different day, these two events are incorrectly divided into two different sessions
  8. The correct computation should group them as one session
  9. In streaming world, we don’t artificially trunk and process the data by daily boundary. We just process events as they come in. It is a more natural mapping to the real-world activities.
  10. Last but not the least, we will take a deep look how should we recover streaming job from extended outage. We will discuss two solutions: backfill and rewind
  11. We have a small team of 10 people. At high level, we provide three platforms for the company. At the bottom layer, we have two building blocks: messaging as a service that builds on top of Apache Kafka and stream processing as a service that builds on top of Apache Flink. Then at the top layer, we have Keystone data pipeline that builds on top of those two building blocks.
  12. The components provided by streaming processing are highlighted in red color
  13. At 10,000 feet level, Keystone data pipeline is responsible for moving data from producer to sinks for data consumption. We will get into more details of Keystone pipeline when we are talking about Keystone router later.
  14. Why did we choose Apache Flink as our stream process engine. Here are some of the key features that Flink provides.
  15. Flink guarantees exactly-once semantics for stateful computations. In another world, Flink ensures that operators do not perform duplicate updates to their state in the events of failures. We need rewindable source like Kafka to achieve exactly-once application state. If a job writes to a sink (which most jobs probably do), there are two ways to achieve end-to-end exactly once. First, idempotent sink. Second, if sink supports transaction, then Flink can do two-phase commit and commit transaction as part of checkpoint
  16. Windows are at the heart of processing unbounded streams. Windows split the stream into “buckets” of finite size. Then we can apply computations on those buckets. Flink supports flexible windowing based on time, count, or sessions. Windows can also be customized with flexible triggering conditions to support sophisticated streaming patterns.
  17. Flink supports stream processing and windowing with event time semantics. Event time makes it easy to compute accurate results with out-of-order or late-arrival events.
  18. Flink state backend defines the data structure that holds the state. It also implement the logic to take a snapshot of the job state and store that snapshot as part of a checkpoint to some distributed file system like HDFS or S3. Checkpoints is how Flink achieve fault tolerance. Flink also supports incremental checkpointing for RocksDB state backend, which greatly reduce the snapshot size in many cases.
  19. As checkpoint barrier flows through the job graph as part of the data stream, each operator takes a snapshot and upload the snapshot. Most of the snapshot work happened asynchronously in the background. This allows the system to maintain high throughput while achieving fault tolerance. Flink’s checkpoint performance is one of the biggest reason that we choose Flink as our stream processing engine. This really enabled the possibilities for large-state jobs.
  20. It provides multiple levels of programming API from the low-level process function to high-level SQL
  21. Let’s look at those offerings plotted in the two dimensions of “ease of use” and “capability”.
  22. We first migrated our Keystone router to Flink platform. Keystone router is arguably the most important use case. It is also probably the simplest use case. We have ~2,000 routing jobs in Keystone globally. We have to make this use case as easy as possible. We are attacking Keystone router use case first, because it showcases that we can run Flink at very large scale. That builds up the confidence for our users
  23. Then we build our platform to enable users to write custom stream processing job. It gives user the total control. But it is not necessarily the simplest way to get things done.
  24. That’s why we are exploring SQL. Industry trend shows SQL can increase the adoption of stream processing. User just need to write a SQL query. My team will be managing the executions of SQL jobs. User doesn’t need to worry about operations
  25. User defined function extends the capability of Keystone router and SQL UDF extended the expressiveness of SQL. that should help the adoption of SQL Right now, our router only supports simple xpath based filter and top-level field projection. If users want more sophisticated filtering or transformation logic, right now they have to write and manage their own custom applications. We want to make it simpler for them. If we can plug in user defined function (e.g. in javascript) into our router and manage those routing jobs, then users don’t need to worry about operations.
  26. Titus is a container orchestration platform developed in house. It is similar to Kubernetes. Our Flink jobs run on top of Titus platform.
  27. We start two Titus job for Flink runtime: one for job manager and one for task managers. Together they form one Flink cluster. We only submit one job for each Flink cluster for isolation purpose.
  28. Pretty much every applications publishes some data to our data pipeline.
  29. We ingest over 1 trillion unique events per day. That is about ~3 PBs.
  30. Need to update this one
  31. One of our largest data stream has a fan-out of 13 outputs. It used to be 20 outputs.
  32. We have ~2,000 routing jobs running in prod env. That’s it is so important to make it self serving and fully automated. Otherwise, my team will be overwhelmed just to dealing with the operation works. That’s why our initial focus for stream processing is Keystone router, because it is our biggest use case. Once user provision routing jobs via our self service UI, my team is responsible for managing the operations of those jobs. We also run each routing job in isolation as its own Flink cluster. Such isolation is important for our platform to deal with outage or backpressure. If one ElasticSearch cluster is having issue, we don’t want it to affect the routing to other ElasticSearch clusters or Kafka sink or Hive sink.
  33. After showing you the simple point-and-click Keystone router, let’s see how did we enable user to write custom stream processing jobs.
  34. Dashboard: system metrics (like cpu, memory, disk), jvm metrics (like GC pause), flink metrics, Kafka metrics etc.
  35. That greatly reduced the barrier of entrance and takes care of the boilerplate work. What used to hours, now it only takes literally just a minute. Some of our data engineers said that now it is so easy to create a streaming job, I don’t really need SQL right now or at least it’s not an urgent need.
  36. We tested the code locally. Now we are ready to deploy it in cloud.
  37. We automatically extract the application config from app docker image and present them in UI. User can choose to override any setting during deployment time. E.g. it’s very common that the kafka cluster name/VIP are different btw test and prod env. When user deploy the same image in prod env, it is very likely they need to override the Kafka cluster VIP.
  38. When we build a platform, we have to consider non-happy path. In the last section, we will take a deep look at how should a streaming job recover from failure. Failure recover is one my favorite topic.
  39. Things can fail.
  40. Your job may need to query other external service to enrich data and your dependency service may fail.
  41. We are offering two solutions for failure recovery. First is launch a backfill job running in parallel with the live job and let backfill job reprocess data from outage period. Second is rewind the regular Flink job to a checkpoint before the outage happened and resume data processing from that point.
  42. Most of our data streams land in Hive table in addition to Kafka where stream processing applications read from. Our Hive data warehouse usually stores data for a few months. It is a perfect source for backfill. We can launch a backfill job replaying Hive data from the outage period. Note backfill job is processing a finite dataset from Hive. So it will finish after it is done with processing data from outage period.
  43. It is very easy to deploy a backfill job using the same docker image as the live job. In this example, I configured two sources but the job actually only reads from one source based on which source is chosen. For the long-running live job, I would chose Kafka source. For short-lived backfill job, I would chose Hive source. This UI is a little misleading, because user may think this job reads from both Kafka and Hive. We will improve it very soon.
  44. To read from Hive source, user need to provide Hive database and table name, and also the SQL where clause to specify Hive partitions. Most of our Hive tables are partitioned by date and hour. So the SQL where clause would typically be the time range for the data you want to re-process.
  45. What is Lambda architecture?
  46. This all sounds great. So what’s the catch?
  47. With stateful stream processor, each input record is evaluated against the application state accumulated over time. When we launch a brand-new backfill job, we need to build up such state before we can emit correct output.
  48. In addition to process the data during outage period, we also need to process additional data from warm-up period. How long that warm-up period needs to be is highly depend on use cases. Could be as short as a few minutes. Could be as long as hours or days.
  49. We added warm-up data. But here is the chicken and egg problem. Computation during warm-up period also produce incorrect result. It’s up to application to make sure no output is emitted during warm-up period. That is added complexity for application
  50. Kafka consumers read messages in the same order they were written to a partition. If data is balancedly distributed to all partitions, that achieves a good message ordering for the system.
  51. However, ordering is very problematic for Hive source. In order to understand that, we should look at how Hive read data from S3. Here we are reading a Hive partition with 5 S3 files
  52. Hadoop mapreduce usually split input data into logical trunks called splits. In this example, those 5 S3 files are splitted into 10 logical trunks.
  53. In our implementation, Flink job manager does the split calculation and ship the split result to all task managers as part of the job graph info
  54. Each task manager then runs the same split assignment algorithm (e.g. round-robin) to decide which splits it should read.
  55. Hive doesn’t track and maintain ordering among those files. That is not required by batch system. If ordering is needed, user need to sort the data e.g. using “orderBy” operator in SQL. If we look at the first task manager, it process splits from hours out of order: hour 0 first, then hour 23, 12, 3
  56. Note that we are not asking for total ordering in the distributed system. But stream processing job usually expects certain behavior or threshold for late events. User sets allowed lateness for live job accordingly (e.g. to 10 minutes), which is very reasonable for live source like Kafka. Now for backfill job Hive source emitting records in arbitrary time order, it can screw up your computation correctness and produce wrong result.
  57. We talked about Flink checkpoint earlier. It is how Flink implement fault tolerance. Upon job failure, it automatically rewind the job to last successful checkpoint. Rewind of application state also includes Kafka offsets.
  58. We talked about Flink checkpoint earlier. It is how Flink implement fault tolerance. Upon job failure, it automatically rewind the job to last successful checkpoint. Rewind of application state also includes Kafka offsets.
  59. Now let’s extend the basic idea to more general failure recovery. With flink rewind, we don’t have the warmup and ordering issue. Kafka offsets are part of the application state. Flink will rewind Kafka offsets along with other application state. After rewind, Flink job is resuming from a correct application state. There is no warm-up issue. Here we are still working with Kafka source. So ordering is maintained. Rewind works for all situations, stateless or stateful.
  60. Now let’s extend the basic idea to more general failure recovery. With flink rewind, we don’t have the warmup and ordering issue. Kafka offsets are part of the application state. Flink will rewind Kafka offsets along with other application state. After rewind, Flink job is resuming from a correct application state. There is no warm-up issue. Here we are still working with Kafka source. So ordering is maintained. Rewind works for all situations, stateless or stateful.
  61. Obviously this requires user to resolve the issue before Kafka retention expired. Otherwise, we can’t rewind. Our typical Kafka retention are hours or days
  62. Here is a comparison of these two solutions
  63. You may wonder if Flink rewind works for all use cases, why bother to provide Hive backfill. We can only keep relative short retention for Kafka data, typically hours or days. Our Hive data warehouse stores data in S3 usually with a few months of retention. So data should be there when we need them. With Hive backfill, we probably can achieve fast recovery. S3 is super scalable. We can launch the backfill job running in parallel with live job. We can even make the backfill job running in a much bigger cluster. With bigger cluster, backfill job can process events at a much higher rate to speed up the recovery.
  64. Today we are recommending Hive backfill for stateless jobs and Flink rewind for stateful jobs. But we are still exploring if we can push the boundary.
  65. E.g. can we solve the ordering issues for Hive source? Can we take care of the warm-up issue at platform? If we can, then we can probably apply Hive backfill for stateful jobs too.
  66. Or maybe just forget Hive as backfill source. Let’s make sure we have enough retention for our Kafka topics. We should only rely on Flink rewind. We don’t know for sure where we ultimately land. We need to do more evaluation.
  67. No matter it’s Hive as backfill source or Flink rewind, there are some caveats that we should watch out for
  68. At steady state when there is lagging, live job process events as they come into Kafka. Throughput is determined by the incoming message rate. During recovery phase, this is a little different story, since now we have a backlog to clear. Streaming job will process events as fast as it can. As mentioned earlier, we can also run backfill job with a bigger cluster size to speed up recovery. It is possible that reprocessing can cause 10x load on dependency services or sink. We need to watch out such impact You may wonder why is this an issue? Back-pressure will kick in and streaming job will slow down naturally. The problem is on the impact of those external services, because they may be serving requests from the critical path of video streaming from Netflix users. We don’t want to affect the critical path of video streaming for Netflix customers. We should define reasonable recovery time and size the job cluster properly.
  69. During outage period, streaming job probably continues to write output to sink. These output may be incorrect. Such side effect can’t be undone by by stream process platform. It’s up to application to handle it Duplicates are ok. Consumer can deal with it. Idempotent sink, E.g. ElasticSearch, Cassandra and EVCache have last write win. “bad” data can be wiped out in sink. E.g. drop Hive table partitions containing “bad” data and populate them again via backfill or rewind
  70. If Flink job is calling an external service, that service is unlikely to participate rewind. In this example, we have an application calling A/B service to enrich original data stream with A/B test cell allocation information.
  71. Initially everything is fine, job is processing latest msg X
  72. Msg X contains user Alice. So our streaming job asks A/B service which A/B test cell is Alice allocated to. A/B service returns cell A.
  73. Msg X contains user Alice. So our streaming job asks A/B service which A/B test cell is Alice allocated to. A/B service returns cell A.
  74. Outage also happened
  75. Outage also happened
  76. Outage also happened
  77. When we query A/B service again, now it returns a different answer cell B. This can affect computation correctness. How much this becomes an issue is highly dependent on the use case.
  78. One possible way to solve this issue is to convert A/B data from a table to a stream. Previously, we are essentially doing a table lookup to A/B service
  79. If we can convert A/B data to a stream source and join it with the original data stream, now A/B data also becomes part of the application state. It will be checkpointed together with other application state. When rewind happens, we will rewind A/B data along with other application state.
  80. Netflix is a pioneer in chaos engineering. We constantly trigger or simulate failures to evaluate how our system reacts to failure. Similar for stream processing applications, we also encourage our users to regularly exercise the backfill or rewind tooling. practice practice practice. Users will get familiar with the recovery process. We can make sure the tooling works when we need them.
  81. We want to make stream processing so easy that a caveman can do it. That is our slogan.
  82. Before I opening up for questions, I want to mention that I will be at the O'Reilly Booth between 3 and 4 pm this afternoon. If you have more questions or just like to chat, please drop by.