Stateful Stream
Processing at In-Memory
Speed
Jamie Grier
@jamiegrier
jamie@data-artisans.com
Who am I?
• Director of Applications Engineering at data
Artisans
• Previously working on streaming computation at
Twitter, Gnip and Boulder Imaging
• Involved in various kinds of stream processing for
about a decade
• High-speed video, social media streaming, general
frameworks for stream processing
Overview
• In stateful stream processing the bottleneck has often
been the key-value store
• Accuracy has been sacrificed for speed
• Lambda Architecture was developed to address
shortcomings of stream processors
• Can we remove the key-value store bottleneck and
enable processing at in-memory speeds?
• Can we do this accurately without Lamba Architecture?
Problem statement
• Incoming message rate: 1.5 million/sec
• Group by several dimensions and aggregate
over 1 hour event-time windows
• Write hourly time series data to database
• Respond to queries both over historical data and
the live in-flight aggregates
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:01
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:03
tweet-id: 2, event: url-
click, time: 02:01:01
tweet-id: 1, event:
impression, time:
02:02:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Stream
tweet-id: 1, event: url-
click, time: 01:01:03
tweet-id: 2, event: url-
click, time: 01:01:02
tweet-id: 1, event:
impression, time:
01:01:01
tweet-id: 2, event: url-
click, time: 02:02:01
tweet-id: 1, event:
impression, time:
02:01:02
Query Result
tweet-id: 1, event: url-
click, time: 01:00:00 1
tweet-id: 1,
event: *,
time: 01:00:00
2
tweet-id: *,
event: *,
time: 01:00:00
3
tweet-id: *,
event: impression,
time: 02:00:00
1
tweet-id: 2,
event: *,
time: 02:00:00
1
Input and Queries
Time Series Data
0
25
50
75
100
125
01:00:00 02:00:00 03:00:00 04:00:00
Tweet Impressions
Tweet 1 Tweet 2
Any questions so far?
Legacy System
Stream Processor
Hadoop
Lambda Architecture
Streaming
Batch
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
Stream Processor
Legacy System
Lambda Architecture
Hadoop
Streaming
Batch
• Aggregates built directly in
key/value store
• Read/modify/write for every
message
• Inaccurate: double-counting,
lost pre-aggregated data
• Hadoop job improves results
after 24 hours
Legacy System
(Lambda Architecture)
Any questions so far?
Goals for Prototype
System
• Feature parity with existing system
• Attempt to reduce hardware footprint by 100x
• Exactly once semantics: compute correct results in real-
time with or without failures. Failures should not lead to
missing data or double counting
• Satisfy realtime queries with low latency
• One system: No Lambda Architecture!
• Eliminate the key/value store bottleneck (big win)
My road to
Apache Flink
• Interested in Google Cloud Dataflow
• Google nailed the semantics for stream processing
• Unified batch and stream processing with one model
• Dataflow didn’t exist in open source at the time (or so I
thought) and I wanted to build it.
• My wife wouldn’t let me quit my job!
• Dataflow SDK is now open source as Apache Beam and
Flink is the most complete runner.
Why Apache Flink?
• Basically identical semantics to Google Cloud Dataflow
• Flink is a true fault-tolerant stateful stream processor
• Exactly once guarantees for state updates
• The state management features might allow us to eliminate the key-value
store
• Windowing is built-in which makes time series easy
• Native event time support / correct time based aggregations
• Very fast data shuffling in benchmarks: 83 million msgs/sec on 30 machines
• Flink “just works” with no tuning - even at scale!
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
Streaming
Prototype System
Apache Flink
We now have a sharded key/value store
inside the stream processor
Streaming
Prototype System
Apache Flink
Why not just query that!
We now have a sharded key/value store
inside the stream processor
Streaming
Prototype System
Apache Flink
Query
Servic
e
Why not just query that!
We now have a sharded key/value store
inside the stream processor
Prototype System
• Eliminates the key-value store
bottleneck
• Eliminates the batch layer
• No more Lambda Architecture!
• Realtime queries over in-flight
aggregates
• Hourly aggregates written to
database
The Results
• Uses 0.5% of the resources of the legacy system:
An improvement of 200x with zero tuning!
• Exactly once analytics in realtime
• Complete elimination of batch layer and Lambda
Architecture
• Successfully eliminated the key-value store
bottleneck
How is 200x improvement
possible?
• The key is making use of fault-tolerant state inside the
stream processor
• Computation proceeds at in-memory speeds
• No need to make requests over the network to update
values in external store
• Dramatically less load on the database because only the
completed window aggregates are written there.
• Flink is extremely efficient at network I/O and data shuffling,
and has highly optimized serialization architecture
Does this matter
at smaller scale?
• YES it does!
• Much larger problems on the same hardware
investment
• Exactly-once semantics and state management
is important at any scale!
• Engineering time invested can be expensive at
any scale if things don’t “just work”.
Summary
• Used stateful operator features in Flink to remove
the key/value store bottleneck
• Dramatic reduction in hardware costs (200x)
• Maintained feature parity by providing low-latency
queries for in flight aggregates as well as long-
term storage of hourly time series data
• Actually improved accuracy of aggregations:
Exactly-once vs. at least once semantics
Questions?
Thanks!

Stateful Stream Processing at In-Memory Speed

  • 1.
    Stateful Stream Processing atIn-Memory Speed Jamie Grier @jamiegrier jamie@data-artisans.com
  • 2.
    Who am I? •Director of Applications Engineering at data Artisans • Previously working on streaming computation at Twitter, Gnip and Boulder Imaging • Involved in various kinds of stream processing for about a decade • High-speed video, social media streaming, general frameworks for stream processing
  • 3.
    Overview • In statefulstream processing the bottleneck has often been the key-value store • Accuracy has been sacrificed for speed • Lambda Architecture was developed to address shortcomings of stream processors • Can we remove the key-value store bottleneck and enable processing at in-memory speeds? • Can we do this accurately without Lamba Architecture?
  • 4.
    Problem statement • Incomingmessage rate: 1.5 million/sec • Group by several dimensions and aggregate over 1 hour event-time windows • Write hourly time series data to database • Respond to queries both over historical data and the live in-flight aggregates
  • 5.
    Input and Queries Stream tweet-id:1, event: url- click, time: 01:01:01 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:03 tweet-id: 2, event: url- click, time: 02:01:01 tweet-id: 1, event: impression, time: 02:02:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1
  • 6.
    Input and Queries Stream tweet-id:1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1
  • 7.
    Input and Queries QueryResult tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1 Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02
  • 8.
    Input and Queries Stream tweet-id:1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1
  • 9.
    Query Result tweet-id: 1,event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1 Input and Queries Stream tweet-id: 1, event: url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02
  • 10.
    Stream tweet-id: 1, event:url- click, time: 01:01:03 tweet-id: 2, event: url- click, time: 01:01:02 tweet-id: 1, event: impression, time: 01:01:01 tweet-id: 2, event: url- click, time: 02:02:01 tweet-id: 1, event: impression, time: 02:01:02 Query Result tweet-id: 1, event: url- click, time: 01:00:00 1 tweet-id: 1, event: *, time: 01:00:00 2 tweet-id: *, event: *, time: 01:00:00 3 tweet-id: *, event: impression, time: 02:00:00 1 tweet-id: 2, event: *, time: 02:00:00 1 Input and Queries
  • 11.
    Time Series Data 0 25 50 75 100 125 01:00:0002:00:00 03:00:00 04:00:00 Tweet Impressions Tweet 1 Tweet 2
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
    • Aggregates builtdirectly in key/value store • Read/modify/write for every message • Inaccurate: double-counting, lost pre-aggregated data • Hadoop job improves results after 24 hours Legacy System (Lambda Architecture)
  • 23.
  • 24.
    Goals for Prototype System •Feature parity with existing system • Attempt to reduce hardware footprint by 100x • Exactly once semantics: compute correct results in real- time with or without failures. Failures should not lead to missing data or double counting • Satisfy realtime queries with low latency • One system: No Lambda Architecture! • Eliminate the key/value store bottleneck (big win)
  • 25.
    My road to ApacheFlink • Interested in Google Cloud Dataflow • Google nailed the semantics for stream processing • Unified batch and stream processing with one model • Dataflow didn’t exist in open source at the time (or so I thought) and I wanted to build it. • My wife wouldn’t let me quit my job! • Dataflow SDK is now open source as Apache Beam and Flink is the most complete runner.
  • 26.
    Why Apache Flink? •Basically identical semantics to Google Cloud Dataflow • Flink is a true fault-tolerant stateful stream processor • Exactly once guarantees for state updates • The state management features might allow us to eliminate the key-value store • Windowing is built-in which makes time series easy • Native event time support / correct time based aggregations • Very fast data shuffling in benchmarks: 83 million msgs/sec on 30 machines • Flink “just works” with no tuning - even at scale!
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    Prototype System Apache Flink Wenow have a sharded key/value store inside the stream processor Streaming
  • 37.
    Prototype System Apache Flink Whynot just query that! We now have a sharded key/value store inside the stream processor Streaming
  • 38.
    Prototype System Apache Flink Query Servic e Whynot just query that! We now have a sharded key/value store inside the stream processor
  • 39.
    Prototype System • Eliminatesthe key-value store bottleneck • Eliminates the batch layer • No more Lambda Architecture! • Realtime queries over in-flight aggregates • Hourly aggregates written to database
  • 40.
    The Results • Uses0.5% of the resources of the legacy system: An improvement of 200x with zero tuning! • Exactly once analytics in realtime • Complete elimination of batch layer and Lambda Architecture • Successfully eliminated the key-value store bottleneck
  • 41.
    How is 200ximprovement possible? • The key is making use of fault-tolerant state inside the stream processor • Computation proceeds at in-memory speeds • No need to make requests over the network to update values in external store • Dramatically less load on the database because only the completed window aggregates are written there. • Flink is extremely efficient at network I/O and data shuffling, and has highly optimized serialization architecture
  • 42.
    Does this matter atsmaller scale? • YES it does! • Much larger problems on the same hardware investment • Exactly-once semantics and state management is important at any scale! • Engineering time invested can be expensive at any scale if things don’t “just work”.
  • 43.
    Summary • Used statefuloperator features in Flink to remove the key/value store bottleneck • Dramatic reduction in hardware costs (200x) • Maintained feature parity by providing low-latency queries for in flight aggregates as well as long- term storage of hourly time series data • Actually improved accuracy of aggregations: Exactly-once vs. at least once semantics
  • 44.
  • 45.

Editor's Notes

  • #14 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #15 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #16 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #17 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #18 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #19 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #20 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #21 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours
  • #22 Aggregates built directly in key/value store Inaccurate: double-counting, lost aggregates Hadoop batch job “fixes” later (Lambda Architecture) Hadoop job runs every 24 hours