The need for gleaning answers from data in real-time is moving from nicety to a necessity. There are few options to analyze the never-ending stream of unbounded data at scale. Let’s compare and contrast the core principles and technologies the different open source solutions available to help with this endeavor, and where in the future processing engines need to evolve to solve processing needs at scale. These findings are based on the experience of continuing to build a scalable solution in the cloud to process over 700 billion events at Netflix, and how we are embarking on the next journey to evolve unbounded data processing engines.
1. Monal Daxini
Engineering Manager, Stream Processing
Real Time Data Infrastructure
@monaldax @Netflix #keystone
DEEP DIVE INTO
UNBOUNDED DATA PROCESSING SYSTEMS
Sep 17 2016
3. ● Streaming is used to mean
○ infinite set (streaming data) or
○ type of data processing engine (stream processor)
Overloaded Term
4. ● Batch is used to mean
○ finite set (batched) or
○ the type of execution engine (bath processor)
Overloaded Term
5. ● Unbounded → Infinite data elements (order not implied)
● Bounded → Finite data elements (order not implied)
● Streaming / Batch → use to exclusively describe
execution engine
Let’s be precise
6. ● Streaming execution engine could process
out-of-order unbounded or bounded data
● Batch execution engine could process
out -of-order unbounded or bounded data
Hence..
7. Based on tradeoffs between
● Latency
● Accuracy (correctness)
● Cost
How do I choose an engine?
Choose an engine that lets you make these tradeoffs for each use case.
8. ● At-most once processing
● At-least once processing
● Exactly-once processing*
Processing semantics
9. Easy, just
● Reprocess the finite set again on failure
○ More efficient with checkpointing
Accurate bounded data processing
10. Needs
1. Consistent state (correctness)
- across failure via checkpointing
2. Tools / techniques to reason about time
Accurate unbounded data processing
11. ● Event Time - time event was created
● Ingest Time - time event was ingested into the engine
● Processing Time - time the event is processed
Events & Time
13. ● Not needed for operating on each element
● Needed for some operations on unbounded data
○ grouping: aggregations, outer joins
Windowing
14. ● Aligned - count or time based
○ Sliding
○ Fixed - Tumbling, Hopping
● Unaligned
○ Session / Dynamic
Windowing Types
15. Event-Time Based Tumbling Windows
Event Time
Processing
Time
11:0010:00 15:0014:0013:0012:00
11:0010:00 15:0014:0013:0012:00
Input
Output
Adapted from: The Apache Beam Model, Tyler Akidau, Frances Perry
16. Process-Time Based Sliding Windows
Processing
Time
11:0010:00 15:0014:0013:0012:00
Input
Output
Adapted from: The Apache Beam Model, Tyler Akidau, Frances Perry
Processing
Time
11:0010:00 15:0014:0013:0012:00
17. Session Windows (Unaligned)
Event Time
Processing
Time
11:0010:00 15:0014:0013:0012:00
11:0010:00 15:0014:0013:0012:00
Input
Output
Gap Duration
Adapted from: The Apache Beam Model, Tyler Akidau, Frances Perry
18. ● When to compute and materialize window results
○ Before a window (early firing)
○ At window completion
○ After window completion (late firing)
Triggers
19.
20. Click impressions for a movie within a row
● What - click count per listed movie, & enrich with movie metadata
● Where - at every 4 mins past current event-time
● When - trigger every 2 mins (wall clock) & 4 mins (event-time)
● How - update the count for late event (mobile reconnects)
Click count Example
Reference: Dataflow Model
21. Google Cloud Platform 21
Watermark - Reasoning about completenessProcessingTime
Event Time
Ideal
Watermark Watermarks describe event time progress.
"No timestamp earlier than the
watermark will be seen"
Adapted from: The Apache Beam Model, Tyler Akidau, Frances Perry
23. Google Cloud Platform 23
Watermark in PracticeProcessingTime
Event Time
Skew
Watermark
● Often heuristic-based.
○ timestamp < watermark can
show up
● Too Slow? Results are delayed.
● Too Fast? Some data is late.
Adapted from: The Apache Beam Model, Tyler Akidau, Frances Perry
24. Watermark is heuristic based. For late data
● Emit late click count, let sink or downstream app accumulate
● Emit correct value - Fetch earlier count, add late click count, emit to sink
Late Data Handling (challenging)
Reference: Dataflow Model
27. ● Time support
○ Event, Processing, and Ingestion time
● Windowing
○ Fixed, Sliding, Session / Dynamic
● Watermark (completeness)
Data flow Functionality (review)
● Deal with late data
● Event Processing Semantics
● Checkpoints / Savepoints
○ Metadata & Data
28. ● Map, Filter, projection, grouping, etc.
● Joining - streams with other streams or static data
● Chain functionality - DAG of transformations
● Support for different event sources and sinks
● Streaming SQL
Functional Features
84. SPaaS Vision (plot)
● Self Service
● Multi-tenant support for stateful stream processing apps
● Autoscaling managed infrastructure
● Support for schemas
85. SPaaS Architecture (plot)
SPaaS Manager
Titus Container
Runtime
Framework Specific API
or
Common API (Beam)
[ Dockerized Job ]
1. Create
2. Submit 3. Launch
Runner
Running Job
1. Submit Job DSL (SQL)
Tooling/ Dashboard
86. Why Apache Beam?
○ Portable API layer to build sophisticated data processing apps
■ Support multiple execution engines
○ Unified model API over bounded and unbounded data sources
○ Millwheel, FlumeJava, Dataflow model lineage
SPaaS - “Beam Me Up, Scotty ! "
87. Iterative build out: then
● First - Flink on Titus in VPC, AWS
○ Titus is a cloud runtime platform for container based jobs
● Next - Apache Beam and Flink runner
SPaaS - Pilot
88. 2.
1.
Keystone SPaaS-Flink Pilot Use Cases
Stream
Consumers
Flink
Router
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
3.
Demux
MergeControl Plane
Self Service UI
89. Titus Job
Task Manager
IP
Titus Host 4 Titus Host 5
Flink Program Deployment (prod shadow)
Zookeeper
Job Manager
(standby)
Job Manager
(master)
Task Manager
Titus Host 1
IP
Titus Host 2
….
Task Manager
Titus Host 3
IP
Titus Job
IPIP
AWS
VPC
ENI
90. Titus High Level Architecture
Titus UITitus UI
Docker
Registry
Docker
Registry
Rhea
container
container
container
docker
Titus Agent
metrics agent
container
container
SPaaS-Flink
Titus executor
logging agent
zfsmesos agent
docker
RheaTitus API
Cassandra
Titus Master
Job Management
& Scheduler
S3
Zookeeper
Docker
Registry
EC2 Autoscaling
API
Mesos Master
Titus UI
(CI/CD)
Fenzo
94. Flink Router perf test (YMMV)
○ Note
■ The tests were performed on a specific use case,
■ running in a specific environment, and with
■ with one specific event stream, and setup.
96. Details..
○ Different runtimes for Flink & Samza routers
○ Massively parallel use-case
■ Per element processing
○ Focused on net outcomes
97. Titus Job
Task Manager
IP
Titus Host 4 Titus Host 5
Flink (1.2) Router
Zookeeper
Job Manager
(standby)
Job Manager
(master)
Task Manager
Titus Host 1
IP
Titus Host 2
….
Task Manager
Titus Host 3
IP
Titus Job
IPIP
AWS
VPC
ENI
Backed state