Stream Scaling in Pravega

Stream scaling in Pravega
Flavio Junqueira, Pravega - Dell EMC

Profile: Flavio Junqueira
• Director at Dell EMC
• Lead the Pravega team
• Background
• Distributed computing
• Research: Microsoft, Yahoo!
• Worked on various Apache
projects
• E.g., Apache ZooKeeper, Apache
BookKeeper
Data Works - Barcelona, 2019 2

Pravega
• Pravega is
• A stream store: stream is the storage primitive
• The foundation is segments
• Segments enable a flexible composition of streams
• Segments enable stream scaling
• Stream scaling
• Streams adapt to changes to incoming workload
• Changes the number of segments dynamically
• Respects order

Motivating stream scaling

Data Works - Barcelona, 2019
Social networks
Online shopping
Streams ahoy!
Stream of user events
• Status updates
• Online transactions
5

Social networks
Online shopping
Server monitoring
• Status updates
Stream of server events
• CPU, memory, disk utilization
Streams ahoy!
6

Social networks
Online shopping
Server monitoring
Sensors (IoT)
• Status updates
Stream of server events
• CPU, memory, disk utilization
Stream of sensor events
• Temperature samples
• Samples from radar and image sensors in cars
Streams ahoy!
7

Changes to workload
Events
Servers,
Sensors,
etc.

Changes to the source
Events
Servers,
Sensors,
etc.

Workload cycles and spikes
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Seasonal spikes
0:00
2:00
4:00
6:00
8:00
10:00
12:00
14:00
16:00
18:00
20:00
22:00
1:00
3:00
5:00
7:00
9:00
11:00
13:00
15:00
17:00
19:00
21:00
23:00
Daily cycles
0
2
4
6
8
10
12
14
Weekly cycles
Unplanned

Overprovisioning… We want to do better.

Event processing
Processor 1Source
Source emits 2
events/second
Processor processes
3 events/second
Append-only Log
(segment in Pravega)
Colors represent event keys

Event processing
Source
Processor processes
3 events/second
Processor 1
Source emits 2
events/second
Append-only Log

Event processing
Source
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Processor still processes 3
events/second
✓ Can’t keep up with the
source rate
Processor 1
Append-only Log

Event processing
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Append-only Log

Event processing
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Append-only Log
Processor 2
events/second
Problem: Key order

Event processing
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Processor 2
events/second
Problem: Key order
e2 e1
✓ e1 can be processed aftere2
Append-only Log

Event processing
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
events/second
Split the input and
add processors
Append-only Log

Event processing
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
events/second
Split the input and
add processors
Problem: Key order
e1
e2
✓ e1 can be processed after e2
Append-only Log

Event processing
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
events/second
Split the input and
add processors
Processor 2 only starts once earlier
events have been processed

Scaling in Pravega
- Changes the number of segments dynamically
- Triggered according to incoming traffic
- Orders segments to prevent inconsistencies

Scaling in Pravega

Pravega
• Storing data streams
• Open source
• Under active development
http://pravega.io
http://github.com/pravega/pravega
25Data Works - Barcelona, 2019

Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
26

Messaging
Pub-sub
Bulk store
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
27

Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
28
Pravega

Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
Unbounded
amount of data
Ingestion rate
might vary
29
Pravega

Pravega and Streams
….. 01110110 01100001 01101100
….. 01001010 01101111 01101001
Pravega
01000110
01110110
Append Read
01000110
01110110
Ingest stream data Process stream data
31

Pravega and Streams
01000110
01110110
Append Read
Event writer
Event writer
Event reader
Event reader
Group
• Load balance
• Grow and shrink
Pravega
Ingest stream data Process stream data

Segments in Pravega
01000111
01110110
11000110
01000111
01110110
11000110
Pravega
Stream Composition of
Segment:
• Stream unit
• Append only
• Sequence of bytes
33

Parallelism

Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
• Segments are sequences of bytes
• Use routing keys to determine segment
〈key, 01101001 〉
Routing
key
35
Event writer
Event writer
Event reader
Event reader
Event reader

Segments can be sealed

Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
Once sealed, a segment can’t be
appended to any longer.
Event writer
Event writer
Event reader
Event reader
Event reader

How is sealing segments useful?

Segments in Pravega
Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
0110111101101111
01000110
01101111
Stream
Compose to form a stream

Segments in Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
0110111101101111
01000110
01101111
Stream
• Each segment can live in a different server
• Not limited by the capacity of a single server
• Unbounded streams
00101111 01101001
40
Pravega

Segments in Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
01101111
01000110
01101111
Stream
01101111
Pravega

Stream scaling

01000110
Scaling a stream
….. 01110110 01100001 01101100 01000110
• Stream has one
segment
1
….. 01110110 01100001 01101100
• Seal current
segment
• Create new ones
2
01000110
01000110
• Follows write workload
• Say input load has increased
• Need more parallelism
• Auto or manual scaling

Routing
key space
0.0
1.0
Time
Segment 1
t0

Routing
key space
0.0
1.0
Time
Split
0.5
Segment 1 Segment 2
Segment 3
t0
Hot
keys
t1

Routing
key space
0.0
1.0
Time
Split
0.5
Segment 1 Segment 2
Segment 3
t0
Location 1
t1
Location 2
- Keys are coordinates in a geo application
- E.g., taxi rides

Routing
key space
0.0
1.0
Time
Split Split
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
t0 t1
t2
Location 1
Location 2
- Keys are coordinates in a geo application
- E.g., taxi rides

Routing
key space
0.0
1.0
Time
Split Split Merge
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
t0 t1
t2
Back to
cold

Routing
key space
0.0
1.0
Time
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
t0 t1
t2
Key ranges are not statically
assigned to segments
Split Split Merge
0.9 maps to
Segment 1
0.9 maps to
Segment 2
0.9 maps to
Segment 4
0.9 maps to
Segment 6

Daily Cycles
Peak rate is 10x higher than lowest rate
4:00 AM
9:00 AM
NYC Yellow Taxi Trip Records, March 2015
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Pravega Auto Scaling
Merge Split

How do I control scaling?

Scaling policies
• Configured on a per stream basis
• Specifies a policy for the stream
• Policies
• Fixed
• Set of segments is fixed
• Bytes per second
• Scales up and down according to volume of data
• Target data rate
• Events per second
• Scales up and down according to volume of events
• Target event rate

Auto-Scaling: Triggering a scaling event
• By byte and event rates
• Target T per segment
• Reports every 2 minutes
✓ 2-min rate (2M)
✓ 5-min rate (5M)
✓ 10-min rate (10M)
✓ 20-min rate (20M)
Scale up
x x + 2 min x + 4 min x + 6 min time
• Scaling down
∧ 2M, 5M, 10M < T
∧ 20M < T / 2
2M = 60
5M = 56
10M = 46
T = 50
2M = 60
5M = 60
10M = 48
T = 50
2M = 60
5M = 60
10M = 5
T = 50
2M = 60
5M = 60
10M = 52
T = 50
Scale down
x x + 2 min x + 4 min x + 6 min time
2M = 20
5M = 20
10M = 20
20M = 27
T = 50
• Scaling up
∨ 2M > 5 x T
∨ 5M > 2 x T
∨ 10M > T
2M = 20
5M = 20
10M = 20
20M = 26
T = 50
2M = 20
5M = 20
10M = 20
20M = 25
T = 50
2M = 20
5M = 20
10M = 20
20M = 24
T = 50

Auto-scaling: Internals
Segment store
010101010
Segment
Stats recorder
Auto-scale processor
Auto-scale events
Controller
Append
Event reader

Read order

Reader groups
• Group of event readers
• Read events from a set of streams
• Load distributed across readers of the group
• Segments
• A given reader reads from a set of segments
• Coordination of segment assignment done via a state synchronizer
• State synchronizer
• General facility for synchronizing state across processes
• Uses a revisioned Pravega segment

Reader groups + Scaling
Pravega
Segment 2
Segment 1
Reader
Reader
1
Pravega
Segment 2
Segment 1
Reader
Reader
2
Segment 3
Segment 4
Scale up!

Reader groups + Scaling
Pravega
Segment 2
Segment 1
Reader
Reader
3
Segment 3
Segment 4
• Hit end of segment
• Get successors
• Update reader group state
Pravega
Reader
Reader
4
Segment 4
Segment 2
Segment 3
Pravega
Reader {3}
Reader {2, 4}
5
Segment 4
Segment 2
Segment 3

Building pipelines –
Scaling downstream

Scaling pipelines
Stage 1 Stage 2Source
All stages can handle the load induced by the source

Scaling pipelines
Scaled
Stage 1 Stage 2Big source
Stage 2 can’t cope with
the load change
Load coming from
source increases
Stage 1 scales and
adapts to the load
change

Scaling signals
Pravega AppBig source
• Pravega won’t scale
the application

Scaling signals
Pravega AppBig source
• Pravega won’t scale the
application downstream
• … but it can signal
• E.g., more segments
• E.g., number of unread
bytes is growing
Signals from Pravega

When to scale…

When to scale
1. Input rate has changed
• Higher volume of data coming in
2. Application needs more capacity
• Processing rate is lower compared to input rate
• No change to input rate necessarily

Changes to input rate
Pravega
Source
Source
Map
Map
• Stream processing job
• Say an Apache Flink job
Reader
Reader
Reduce
Reduce

Changes to input rate
Pravega
Source
Source
Task
Task
Source Map
• More capacity to avoid
lagging behind
• Additional tasks
• Additional Pravega readers
Reader
Reader
Reader
Map
Map
Reduce
Reduce

Processing rate not sufficient
Pravega
Source
Source
Task
Task
Map
• Application lags behind
• Additional task
Reader
Reader
Map
Map
Reduce
Reduce

Reader group: listener and metrics
• Listener API
• Register a listener to react to changes
• E.g., changes to the number of segments
• Metrics
• Reports specific values of interest
• E.g., number of unread bytes in a stream

Example: Pravega Flink connector
public class ReaderOperatorRescalingPolicy implements OperatorRescalingPolicy
{
…
@Override
public int rescaleTo(OperatorRescalingContext operatorRescalingContext) {
return currentNumberOfSegments;
}
private class ListenerImpl implements Listener<SegmentNotification> {
@Override
public void onNotification(SegmentNotification notification) {
currentNumberOfSegments = notification.getNumOfSegments();
}
}
…
• Connects Flink and Pravega
• Pravega can be source and sink
• On Apache Flink
• Signals from the source can
trigger dynamic scaling
• E.g., increase the number of
readers

Demo
(Thanks to Till Rohrmann)

Demo Topology
77
Pravega
Source Sink
FILE.out
• Executed on Yarn to support dynamic resource allocation
time
Event rate

Wrap Up

Wrap up
• Pravega
• Stream store
• Scalable ingestion of continuously generated data
• Stream scaling
• Stream data pipelines
• Signaling for dynamic scaling downstream
• Proof of concept with Apache Flink

Questions?
http://pravega.io
http://github.com/pravega/pravega
http://flink.apache.org
http://github.com/pravega/flink-connectors
https://github.com/tillrohrmann/flink/tree/rescalingPolicy
E-mail: fpj@pravega.io
Twitter: @fpjunqueira
Pravega’s Web site
Pravega’s code
Apache Flink’s site
Pravega-Flink connector
Flink dynamic scaling PoC

Stream Scaling in Pravega

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Stream Scaling in Pravega

Similar to Stream Scaling in Pravega (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Stream Scaling in Pravega