Stream processing consists of ingesting and processing continuously generated data, often from end users in web applications or from more challenging settings where devices such as servers and sensors generate events at a high rate. Such scenarios often demand the use of a software stack that is able to scale and accommodate changes to the characteristics of the application.
One of the major challenges with processing data streams is adapting to workload variations (e.g., due to daily cycles or the growth of the population of sources). Systems to ingest stream data typically parallelize it by sharding the incoming messages and events according to a routing key. Having the ability to parallelize ingestion is very effective, but future changes to the workload (which are very often unknown beforehand) might make the initial choice for the degree of parallelism inadequate for even short-term spikes. Consequently, the ability to scale by adapting parallelism according to workload while preserving important API properties, such as per-key order, is highly desirable to handle mission-critical workloads.
In this presentation, we explain how to accommodate changes to workloads in and with Pravega, an open source stream store built to ingest and serve stream data. Pravega primarily manipulates and stores segments (append-only byte sequences), forming streams by creating and composing segments, which it uses to enable the scaling of streams. Stream scaling in Pravega is automatic and transparent to the application, but such a change to the ingestion volume might also require the application to follow and scale its resources downstream (e.g., the operators of an Apache Flink job) to accommodate the new ingestion volume. Pravega signals such changes to the application so that it can react accordingly. The cooperation between Pravega and the downstream application is crucial for building an effective stream data pipeline.
2. Profile: Flavio Junqueira
• Director at Dell EMC
• Lead the Pravega team
• Background
• Distributed computing
• Research: Microsoft, Yahoo!
• Worked on various Apache
projects
• E.g., Apache ZooKeeper, Apache
BookKeeper
Data Works - Barcelona, 2019 2
3. Pravega
• Pravega is
• A stream store: stream is the storage primitive
• The foundation is segments
• Segments enable a flexible composition of streams
• Segments enable stream scaling
• Stream scaling
• Streams adapt to changes to incoming workload
• Changes the number of segments dynamically
• Respects order
Data Works - Barcelona, 2019 3
5. Data Works - Barcelona, 2019
Social networks
Online shopping
Streams ahoy!
Stream of user events
• Status updates
• Online transactions
5
6. Data Works - Barcelona, 2019
Social networks
Online shopping
Server monitoring
Stream of user events
• Status updates
• Online transactions
Stream of server events
• CPU, memory, disk utilization
Streams ahoy!
6
7. Data Works - Barcelona, 2019
Social networks
Online shopping
Server monitoring
Sensors (IoT)
Stream of user events
• Status updates
• Online transactions
Stream of server events
• CPU, memory, disk utilization
Stream of sensor events
• Temperature samples
• Samples from radar and image sensors in cars
Streams ahoy!
7
9. Changes to the source
Data Works - Barcelona, 2019 9
Events
Servers,
Sensors,
etc.
10. Workload cycles and spikes
Data Works - Barcelona, 2019 11
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Seasonal spikes
0:00
2:00
4:00
6:00
8:00
10:00
12:00
14:00
16:00
18:00
20:00
22:00
1:00
3:00
5:00
7:00
9:00
11:00
13:00
15:00
17:00
19:00
21:00
23:00
Daily cycles
0
2
4
6
8
10
12
14
Weekly cycles
Unplanned
12. Event processing
Data Works - Barcelona, 2019 13
Processor 1Source
Source emits 2
events/second
Processor processes
3 events/second
Append-only Log
(segment in Pravega)
Colors represent event keys
13. Event processing
Data Works - Barcelona, 2019 14
Source
Processor processes
3 events/second
Processor 1
Source emits 2
events/second
Colors represent event keys
Append-only Log
(segment in Pravega)
14. Event processing
Data Works - Barcelona, 2019 15
Source
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Processor still processes 3
events/second
✓ Can’t keep up with the
source rate
Processor 1
Colors represent event keys
Append-only Log
(segment in Pravega)
15. Event processing
Data Works - Barcelona, 2019 16
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Colors represent event keys
Append-only Log
(segment in Pravega)
16. Event processing
Data Works - Barcelona, 2019 17
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Append-only Log
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Problem: Key order
17. Event processing
Data Works - Barcelona, 2019 18
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Problem: Key order
e2 e1
✓ e1 can be processed aftere2
Append-only Log
(segment in Pravega)
18. Event processing
Data Works - Barcelona, 2019 19
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Split the input and
add processors
Append-only Log
(segment in Pravega)
19. Event processing
Data Works - Barcelona, 2019 20
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Split the input and
add processors
Problem: Key order
e1
e2
✓ e1 can be processed after e2
Append-only Log
(segment in Pravega)
20. Event processing
Data Works - Barcelona, 2019 21
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Split the input and
add processors
Processor 2 only starts once earlier
events have been processed
21. Data Works - Barcelona, 2019 22
Scaling in Pravega
- Changes the number of segments dynamically
- Triggered according to incoming traffic
- Orders segments to prevent inconsistencies
23. Pravega
• Storing data streams
• Open source
• Under active development
http://pravega.io
http://github.com/pravega/pravega
25Data Works - Barcelona, 2019
24. Data Works - Barcelona, 2019
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
26
25. Data Works - Barcelona, 2019
Messaging
Pub-sub
Bulk store
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
27
26. Data Works - Barcelona, 2019
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
28
Pravega
27. Data Works - Barcelona, 2019
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
Unbounded
amount of data
Ingestion rate
might vary
29
Pravega
28. Pravega and Streams
….. 01110110 01100001 01101100
….. 01001010 01101111 01101001
Pravega
01000110
01110110
Append Read
01000110
01110110
Data Works - Barcelona, 2019
Ingest stream data Process stream data
31
29. Pravega and Streams
01000110
01110110
Append Read
Data Works - Barcelona, 2019 32
Event writer
Event writer
Event reader
Event reader
Group
• Load balance
• Grow and shrink
Pravega
Ingest stream data Process stream data
30. Segments in Pravega
Data Works - Barcelona, 2019
01000111
01110110
11000110
01000111
01110110
11000110
Pravega
Stream Composition of
Segment:
• Stream unit
• Append only
• Sequence of bytes
33
34. Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
Once sealed, a segment can’t be
appended to any longer.
Data Works - Barcelona, 2019 37
Event writer
Event writer
Event reader
Event reader
Event reader
35. How is sealing segments useful?
Data Works - Barcelona, 2019 38
40. 01000110
Scaling a stream
….. 01110110 01100001 01101100 01000110
• Stream has one
segment
1
….. 01110110 01100001 01101100
• Seal current
segment
• Create new ones
2
01000110
01000110
• Follows write workload
• Say input load has increased
• Need more parallelism
• Auto or manual scaling
Data Works - Barcelona, 2019 43
48. Daily Cycles
Peak rate is 10x higher than lowest rate
4:00 AM
9:00 AM
NYC Yellow Taxi Trip Records, March 2015
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
50. How do I control scaling?
Data Works - Barcelona, 2019 55
51. Scaling policies
• Configured on a per stream basis
• Specifies a policy for the stream
• Policies
• Fixed
• Set of segments is fixed
• Bytes per second
• Scales up and down according to volume of data
• Target data rate
• Events per second
• Scales up and down according to volume of events
• Target event rate
Data Works - Barcelona, 2019 56
52. Auto-Scaling: Triggering a scaling event
• By byte and event rates
• Target T per segment
• Reports every 2 minutes
✓ 2-min rate (2M)
✓ 5-min rate (5M)
✓ 10-min rate (10M)
✓ 20-min rate (20M)
Data Works - Barcelona, 2019 57
Scale up
x x + 2 min x + 4 min x + 6 min time
• Scaling down
∧ 2M, 5M, 10M < T
∧ 20M < T / 2
2M = 60
5M = 56
10M = 46
T = 50
2M = 60
5M = 60
10M = 48
T = 50
2M = 60
5M = 60
10M = 5
T = 50
2M = 60
5M = 60
10M = 52
T = 50
Scale down
x x + 2 min x + 4 min x + 6 min time
2M = 20
5M = 20
10M = 20
20M = 27
T = 50
• Scaling up
∨ 2M > 5 x T
∨ 5M > 2 x T
∨ 10M > T
2M = 20
5M = 20
10M = 20
20M = 26
T = 50
2M = 20
5M = 20
10M = 20
20M = 25
T = 50
2M = 20
5M = 20
10M = 20
20M = 24
T = 50
53. Auto-scaling: Internals
Data Works - Barcelona, 2019 58
Segment store
010101010
Segment
Stats recorder
Auto-scale processor
Auto-scale events
Controller
Append
Event reader
55. Reader groups
• Group of event readers
• Read events from a set of streams
• Load distributed across readers of the group
• Segments
• A given reader reads from a set of segments
• Coordination of segment assignment done via a state synchronizer
• State synchronizer
• General facility for synchronizing state across processes
• Uses a revisioned Pravega segment
Data Works - Barcelona, 2019 60
59. Scaling pipelines
Data Works - Barcelona, 2019 64
Stage 1 Stage 2Source
All stages can handle the load induced by the source
60. Scaling pipelines
Data Works - Barcelona, 2019 65
Scaled
Stage 1 Stage 2Big source
Stage 2 can’t cope with
the load change
Load coming from
source increases
Stage 1 scales and
adapts to the load
change
61. Scaling signals
Data Works - Barcelona, 2019 66
Pravega AppBig source
• Pravega won’t scale
the application
62. Scaling signals
Data Works - Barcelona, 2019 67
Pravega AppBig source
• Pravega won’t scale the
application downstream
• … but it can signal
• E.g., more segments
• E.g., number of unread
bytes is growing
Signals from Pravega
64. When to scale
1. Input rate has changed
• Higher volume of data coming in
2. Application needs more capacity
• Processing rate is lower compared to input rate
• No change to input rate necessarily
Data Works - Barcelona, 2019 69
65. Changes to input rate
Data Works - Barcelona, 2019 70
Pravega
Source
Source
Map
Map
• Stream processing job
• Say an Apache Flink job
Reader
Reader
Reduce
Reduce
66. Changes to input rate
Data Works - Barcelona, 2019 71
Pravega
Source
Source
Task
Task
• Stream processing job
• Say an Apache Flink job
Source Map
• More capacity to avoid
lagging behind
• Additional tasks
• Additional Pravega readers
Reader
Reader
Reader
Map
Map
Reduce
Reduce
67. Processing rate not sufficient
Data Works - Barcelona, 2019 72
Pravega
Source
Source
Task
Task
• Stream processing job
• Say an Apache Flink job
Map
• Application lags behind
• Additional task
Reader
Reader
Map
Map
Reduce
Reduce
68. Reader group: listener and metrics
• Listener API
• Register a listener to react to changes
• E.g., changes to the number of segments
• Metrics
• Reports specific values of interest
• E.g., number of unread bytes in a stream
Data Works - Barcelona, 2019 74
69. Example: Pravega Flink connector
Data Works - Barcelona, 2019 75
public class ReaderOperatorRescalingPolicy implements OperatorRescalingPolicy
{
…
@Override
public int rescaleTo(OperatorRescalingContext operatorRescalingContext) {
return currentNumberOfSegments;
}
private class ListenerImpl implements Listener<SegmentNotification> {
@Override
public void onNotification(SegmentNotification notification) {
currentNumberOfSegments = notification.getNumOfSegments();
}
}
…
• Connects Flink and Pravega
• Pravega can be source and sink
• On Apache Flink
• Signals from the source can
trigger dynamic scaling
• E.g., increase the number of
readers
73. Wrap up
• Pravega
• Stream store
• Scalable ingestion of continuously generated data
• Stream scaling
• Stream data pipelines
• Signaling for dynamic scaling downstream
• Proof of concept with Apache Flink
Data Works - Barcelona, 2019 79
74. Data Works - Barcelona, 2019 80
Questions?
http://pravega.io
http://github.com/pravega/pravega
http://flink.apache.org
http://github.com/pravega/flink-connectors
https://github.com/tillrohrmann/flink/tree/rescalingPolicy
E-mail: fpj@pravega.io
Twitter: @fpjunqueira
Pravega’s Web site
Pravega’s code
Apache Flink’s site
Pravega-Flink connector
Flink dynamic scaling PoC