SlideShare a Scribd company logo
1 of 74
Download to read offline
Stream scaling in Pravega
Flavio Junqueira, Pravega - Dell EMC
Profile: Flavio Junqueira
• Director at Dell EMC
• Lead the Pravega team
• Background
• Distributed computing
• Research: Microsoft, Yahoo!
• Worked on various Apache
projects
• E.g., Apache ZooKeeper, Apache
BookKeeper
Data Works - Barcelona, 2019 2
Pravega
• Pravega is
• A stream store: stream is the storage primitive
• The foundation is segments
• Segments enable a flexible composition of streams
• Segments enable stream scaling
• Stream scaling
• Streams adapt to changes to incoming workload
• Changes the number of segments dynamically
• Respects order
Data Works - Barcelona, 2019 3
Motivating stream scaling
Data Works - Barcelona, 2019 4
Data Works - Barcelona, 2019
Social networks
Online shopping
Streams ahoy!
Stream of user events
• Status updates
• Online transactions
5
Data Works - Barcelona, 2019
Social networks
Online shopping
Server monitoring
Stream of user events
• Status updates
• Online transactions
Stream of server events
• CPU, memory, disk utilization
Streams ahoy!
6
Data Works - Barcelona, 2019
Social networks
Online shopping
Server monitoring
Sensors (IoT)
Stream of user events
• Status updates
• Online transactions
Stream of server events
• CPU, memory, disk utilization
Stream of sensor events
• Temperature samples
• Samples from radar and image sensors in cars
Streams ahoy!
7
Changes to workload
Data Works - Barcelona, 2019 8
Events
Servers,
Sensors,
etc.
Changes to the source
Data Works - Barcelona, 2019 9
Events
Servers,
Sensors,
etc.
Workload cycles and spikes
Data Works - Barcelona, 2019 11
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Seasonal spikes
0:00
2:00
4:00
6:00
8:00
10:00
12:00
14:00
16:00
18:00
20:00
22:00
1:00
3:00
5:00
7:00
9:00
11:00
13:00
15:00
17:00
19:00
21:00
23:00
Daily cycles
0
2
4
6
8
10
12
14
Weekly cycles
Unplanned
Overprovisioning… We want to do better.
Data Works - Barcelona, 2019 12
Event processing
Data Works - Barcelona, 2019 13
Processor 1Source
Source emits 2
events/second
Processor processes
3 events/second
Append-only Log
(segment in Pravega)
Colors represent event keys
Event processing
Data Works - Barcelona, 2019 14
Source
Processor processes
3 events/second
Processor 1
Source emits 2
events/second
Colors represent event keys
Append-only Log
(segment in Pravega)
Event processing
Data Works - Barcelona, 2019 15
Source
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Processor still processes 3
events/second
✓ Can’t keep up with the
source rate
Processor 1
Colors represent event keys
Append-only Log
(segment in Pravega)
Event processing
Data Works - Barcelona, 2019 16
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Colors represent event keys
Append-only Log
(segment in Pravega)
Event processing
Data Works - Barcelona, 2019 17
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Append-only Log
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Problem: Key order
Event processing
Data Works - Barcelona, 2019 18
Source
✓ Source rate
increases
✓ New rate: 4
events/second
Processor 1
Processor 2
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Problem: Key order
e2 e1
✓ e1 can be processed aftere2
Append-only Log
(segment in Pravega)
Event processing
Data Works - Barcelona, 2019 19
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Split the input and
add processors
Append-only Log
(segment in Pravega)
Event processing
Data Works - Barcelona, 2019 20
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Split the input and
add processors
Problem: Key order
e1
e2
✓ e1 can be processed after e2
Append-only Log
(segment in Pravega)
Event processing
Data Works - Barcelona, 2019 21
Source
Processor 1
Processor 2
✓ Source rate
increases
✓ New rate: 4
events/second
✓ Add a second processor
✓ Each processor processes 3
events/second
✓ Can keep up with the rate
Split the input and
add processors
Processor 2 only starts once earlier
events have been processed
Data Works - Barcelona, 2019 22
Scaling in Pravega
- Changes the number of segments dynamically
- Triggered according to incoming traffic
- Orders segments to prevent inconsistencies
Scaling in Pravega
Data Works - Barcelona, 2019 24
Pravega
• Storing data streams
• Open source
• Under active development
http://pravega.io
http://github.com/pravega/pravega
25Data Works - Barcelona, 2019
Data Works - Barcelona, 2019
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
26
Data Works - Barcelona, 2019
Messaging
Pub-sub
Bulk store
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
27
Data Works - Barcelona, 2019
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
28
Pravega
Data Works - Barcelona, 2019
Time
PresentRecent
Past
Distant
Past
Anatomy of a stream
Unbounded
amount of data
Ingestion rate
might vary
29
Pravega
Pravega and Streams
….. 01110110 01100001 01101100
….. 01001010 01101111 01101001
Pravega
01000110
01110110
Append Read
01000110
01110110
Data Works - Barcelona, 2019
Ingest stream data Process stream data
31
Pravega and Streams
01000110
01110110
Append Read
Data Works - Barcelona, 2019 32
Event writer
Event writer
Event reader
Event reader
Group
• Load balance
• Grow and shrink
Pravega
Ingest stream data Process stream data
Segments in Pravega
Data Works - Barcelona, 2019
01000111
01110110
11000110
01000111
01110110
11000110
Pravega
Stream Composition of
Segment:
• Stream unit
• Append only
• Sequence of bytes
33
Parallelism
Data Works - Barcelona, 2019 34
Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
• Segments are sequences of bytes
• Use routing keys to determine segment
Data Works - Barcelona, 2019
〈key, 01101001 〉
Routing
key
35
Event writer
Event writer
Event reader
Event reader
Event reader
Segments can be sealed
Data Works - Barcelona, 2019 36
Segments in Pravega
Pravega
01000110
01110110
Segments
Append Read
01101111
01101001
Segments
Once sealed, a segment can’t be
appended to any longer.
Data Works - Barcelona, 2019 37
Event writer
Event writer
Event reader
Event reader
Event reader
How is sealing segments useful?
Data Works - Barcelona, 2019 38
Segments in Pravega
Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
0110111101101111
01000110
01101111
Stream
Compose to form a stream
Data Works - Barcelona, 2019 39
Segments in Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
0110111101101111
01000110
01101111
Stream
Compose to form a stream
• Each segment can live in a different server
• Not limited by the capacity of a single server
• Unbounded streams
Data Works - Barcelona, 2019
00101111 01101001
40
Pravega
Segments in Pravega
01000110
Segments
Segments
01101111
01000110
01000110
01000110
01101111
01101111
01101111
01101111
01000110
01000110
01101111
01000110
01101111
Stream
Compose to form a stream
01101111
Data Works - Barcelona, 2019 41
Pravega
Stream scaling
Data Works - Barcelona, 2019 42
01000110
Scaling a stream
….. 01110110 01100001 01101100 01000110
• Stream has one
segment
1
….. 01110110 01100001 01101100
• Seal current
segment
• Create new ones
2
01000110
01000110
• Follows write workload
• Say input load has increased
• Need more parallelism
• Auto or manual scaling
Data Works - Barcelona, 2019 43
Routing
key space
0.0
1.0
Time
Segment 1
t0
Data Works - Barcelona, 2019 44
Routing
key space
0.0
1.0
Time
Split
0.5
Segment 1 Segment 2
Segment 3
t0
Data Works - Barcelona, 2019 45
Hot
keys
t1
Routing
key space
0.0
1.0
Time
Split
0.5
Segment 1 Segment 2
Segment 3
t0
Data Works - Barcelona, 2019 46
Location 1
t1
Location 2
- Keys are coordinates in a geo application
- E.g., taxi rides
Routing
key space
0.0
1.0
Time
Split Split
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
t0 t1
t2
Data Works - Barcelona, 2019 47
Location 1
Location 2
- Keys are coordinates in a geo application
- E.g., taxi rides
Routing
key space
0.0
1.0
Time
Split Split Merge
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
t0 t1
t2
Data Works - Barcelona, 2019 48
Back to
cold
Routing
key space
0.0
1.0
Time
0.5
0.75
Segment 1 Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
t0 t1
t2
Key ranges are not statically
assigned to segments
Data Works - Barcelona, 2019 49
Split Split Merge
0.9 maps to
Segment 1
0.9 maps to
Segment 2
0.9 maps to
Segment 4
0.9 maps to
Segment 6
Data Works - Barcelona, 2019 50
Daily Cycles
Peak rate is 10x higher than lowest rate
4:00 AM
9:00 AM
NYC Yellow Taxi Trip Records, March 2015
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
Pravega Auto Scaling
Merge Split
How do I control scaling?
Data Works - Barcelona, 2019 55
Scaling policies
• Configured on a per stream basis
• Specifies a policy for the stream
• Policies
• Fixed
• Set of segments is fixed
• Bytes per second
• Scales up and down according to volume of data
• Target data rate
• Events per second
• Scales up and down according to volume of events
• Target event rate
Data Works - Barcelona, 2019 56
Auto-Scaling: Triggering a scaling event
• By byte and event rates
• Target T per segment
• Reports every 2 minutes
✓ 2-min rate (2M)
✓ 5-min rate (5M)
✓ 10-min rate (10M)
✓ 20-min rate (20M)
Data Works - Barcelona, 2019 57
Scale up
x x + 2 min x + 4 min x + 6 min time
• Scaling down
∧ 2M, 5M, 10M < T
∧ 20M < T / 2
2M = 60
5M = 56
10M = 46
T = 50
2M = 60
5M = 60
10M = 48
T = 50
2M = 60
5M = 60
10M = 5
T = 50
2M = 60
5M = 60
10M = 52
T = 50
Scale down
x x + 2 min x + 4 min x + 6 min time
2M = 20
5M = 20
10M = 20
20M = 27
T = 50
• Scaling up
∨ 2M > 5 x T
∨ 5M > 2 x T
∨ 10M > T
2M = 20
5M = 20
10M = 20
20M = 26
T = 50
2M = 20
5M = 20
10M = 20
20M = 25
T = 50
2M = 20
5M = 20
10M = 20
20M = 24
T = 50
Auto-scaling: Internals
Data Works - Barcelona, 2019 58
Segment store
010101010
Segment
Stats recorder
Auto-scale processor
Auto-scale events
Controller
Append
Event reader
Read order
Data Works - Barcelona, 2019 59
Reader groups
• Group of event readers
• Read events from a set of streams
• Load distributed across readers of the group
• Segments
• A given reader reads from a set of segments
• Coordination of segment assignment done via a state synchronizer
• State synchronizer
• General facility for synchronizing state across processes
• Uses a revisioned Pravega segment
Data Works - Barcelona, 2019 60
Reader groups + Scaling
Pravega
Segment 2
Segment 1
Reader
Reader
1
Pravega
Segment 2
Segment 1
Reader
Reader
2
Segment 3
Segment 4
Scale up!
Data Works - Barcelona, 2019 61
Reader groups + Scaling
Pravega
Segment 2
Segment 1
Reader
Reader
3
Segment 3
Segment 4
• Hit end of segment
• Get successors
• Update reader group state
Pravega
Reader
Reader
4
Segment 4
Segment 2
Segment 3
Pravega
Reader {3}
Reader {2, 4}
5
Segment 4
Segment 2
Segment 3
Data Works - Barcelona, 2019 62
Building pipelines –
Scaling downstream
Data Works - Barcelona, 2019 63
Scaling pipelines
Data Works - Barcelona, 2019 64
Stage 1 Stage 2Source
All stages can handle the load induced by the source
Scaling pipelines
Data Works - Barcelona, 2019 65
Scaled
Stage 1 Stage 2Big source
Stage 2 can’t cope with
the load change
Load coming from
source increases
Stage 1 scales and
adapts to the load
change
Scaling signals
Data Works - Barcelona, 2019 66
Pravega AppBig source
• Pravega won’t scale
the application
Scaling signals
Data Works - Barcelona, 2019 67
Pravega AppBig source
• Pravega won’t scale the
application downstream
• … but it can signal
• E.g., more segments
• E.g., number of unread
bytes is growing
Signals from Pravega
When to scale…
Data Works - Barcelona, 2019 68
When to scale
1. Input rate has changed
• Higher volume of data coming in
2. Application needs more capacity
• Processing rate is lower compared to input rate
• No change to input rate necessarily
Data Works - Barcelona, 2019 69
Changes to input rate
Data Works - Barcelona, 2019 70
Pravega
Source
Source
Map
Map
• Stream processing job
• Say an Apache Flink job
Reader
Reader
Reduce
Reduce
Changes to input rate
Data Works - Barcelona, 2019 71
Pravega
Source
Source
Task
Task
• Stream processing job
• Say an Apache Flink job
Source Map
• More capacity to avoid
lagging behind
• Additional tasks
• Additional Pravega readers
Reader
Reader
Reader
Map
Map
Reduce
Reduce
Processing rate not sufficient
Data Works - Barcelona, 2019 72
Pravega
Source
Source
Task
Task
• Stream processing job
• Say an Apache Flink job
Map
• Application lags behind
• Additional task
Reader
Reader
Map
Map
Reduce
Reduce
Reader group: listener and metrics
• Listener API
• Register a listener to react to changes
• E.g., changes to the number of segments
• Metrics
• Reports specific values of interest
• E.g., number of unread bytes in a stream
Data Works - Barcelona, 2019 74
Example: Pravega Flink connector
Data Works - Barcelona, 2019 75
public class ReaderOperatorRescalingPolicy implements OperatorRescalingPolicy
{
…
@Override
public int rescaleTo(OperatorRescalingContext operatorRescalingContext) {
return currentNumberOfSegments;
}
private class ListenerImpl implements Listener<SegmentNotification> {
@Override
public void onNotification(SegmentNotification notification) {
currentNumberOfSegments = notification.getNumOfSegments();
}
}
…
• Connects Flink and Pravega
• Pravega can be source and sink
• On Apache Flink
• Signals from the source can
trigger dynamic scaling
• E.g., increase the number of
readers
Demo
(Thanks to Till Rohrmann)
Data Works - Barcelona, 2019 76
Demo Topology
77
Pravega
Source Sink
FILE.out
• Executed on Yarn to support dynamic resource allocation
time
Event rate
Data Works - Barcelona, 2019
Wrap Up
Data Works - Barcelona, 2019 78
Wrap up
• Pravega
• Stream store
• Scalable ingestion of continuously generated data
• Stream scaling
• Stream data pipelines
• Signaling for dynamic scaling downstream
• Proof of concept with Apache Flink
Data Works - Barcelona, 2019 79
Data Works - Barcelona, 2019 80
Questions?
http://pravega.io
http://github.com/pravega/pravega
http://flink.apache.org
http://github.com/pravega/flink-connectors
https://github.com/tillrohrmann/flink/tree/rescalingPolicy
E-mail: fpj@pravega.io
Twitter: @fpjunqueira
Pravega’s Web site
Pravega’s code
Apache Flink’s site
Pravega-Flink connector
Flink dynamic scaling PoC

More Related Content

What's hot

Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
DataWorks Summit
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
DataWorks Summit
 
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
DataWorks Summit
 

What's hot (20)

End to End Supply Chain Control Tower
End to End Supply Chain Control TowerEnd to End Supply Chain Control Tower
End to End Supply Chain Control Tower
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
From an experiment to a real production environment
From an experiment to a real production environmentFrom an experiment to a real production environment
From an experiment to a real production environment
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
Solving the Industry 4.0 challenges on the logistics domain using Apache Meso...
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
Munich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data TransformationMunich Re: Driving a Big Data Transformation
Munich Re: Driving a Big Data Transformation
 
The Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren ShureThe Rise of Engineering-Driven Analytics by Loren Shure
The Rise of Engineering-Driven Analytics by Loren Shure
 
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
Data Offload for the Chief Data Officer – how to move data onto Hadoop withou...
 
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...Reaching scale limits on a Hadoop platform: issues and errors created by spee...
Reaching scale limits on a Hadoop platform: issues and errors created by spee...
 
The case of vehicle networking financial services accomplished by China Mobile
The case of vehicle networking financial services accomplished by China MobileThe case of vehicle networking financial services accomplished by China Mobile
The case of vehicle networking financial services accomplished by China Mobile
 

Similar to Stream Scaling in Pravega

An elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkAn elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache Flink
DataWorks Summit
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
Databricks
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
Rob Winters
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
HostedbyConfluent
 
How a real time platform supports the modern utility
How a real time platform supports the modern utilityHow a real time platform supports the modern utility
How a real time platform supports the modern utility
robgirvan
 
Axelink Tools_BO2C10_0415_EN
Axelink Tools_BO2C10_0415_ENAxelink Tools_BO2C10_0415_EN
Axelink Tools_BO2C10_0415_EN
Eric Soyer
 
METRO NTD FINAL Presentation Last revision
METRO NTD FINAL Presentation Last revisionMETRO NTD FINAL Presentation Last revision
METRO NTD FINAL Presentation Last revision
Rogelio Fonseca
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 
SAP Leonardo Blockchain Services and Use-Cases
SAP Leonardo Blockchain Services and Use-CasesSAP Leonardo Blockchain Services and Use-Cases
SAP Leonardo Blockchain Services and Use-Cases
Nagesh Caparthy
 

Similar to Stream Scaling in Pravega (20)

Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
Flink Forward San Francisco 2018: Till Rohrmann & Flavio Junqueira - "Scaling...
 
An elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache FlinkAn elastic batch-and stream-processing stack with Pravega and Apache Flink
An elastic batch-and stream-processing stack with Pravega and Apache Flink
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
 
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real WorldWSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
 
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real WorldWSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
 
Tableau @ Spil Games
Tableau @ Spil GamesTableau @ Spil Games
Tableau @ Spil Games
 
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
Buckle Up! With Valerie Burchby and Xinran Waibe | Current 2022
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
 
How a real time platform supports the modern utility
How a real time platform supports the modern utilityHow a real time platform supports the modern utility
How a real time platform supports the modern utility
 
Axelink Tools_BO2C10_0415_EN
Axelink Tools_BO2C10_0415_ENAxelink Tools_BO2C10_0415_EN
Axelink Tools_BO2C10_0415_EN
 
METRO NTD FINAL Presentation Last revision
METRO NTD FINAL Presentation Last revisionMETRO NTD FINAL Presentation Last revision
METRO NTD FINAL Presentation Last revision
 
Data Design - the x factor for a successful data migration v1.3
Data Design - the x factor for a successful data migration v1.3Data Design - the x factor for a successful data migration v1.3
Data Design - the x factor for a successful data migration v1.3
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
SAP Leonardo Blockchain Services and Use-Cases
SAP Leonardo Blockchain Services and Use-CasesSAP Leonardo Blockchain Services and Use-Cases
SAP Leonardo Blockchain Services and Use-Cases
 
Automated DevOps for your Digital Transformation Journey!
Automated DevOps for your Digital Transformation Journey!Automated DevOps for your Digital Transformation Journey!
Automated DevOps for your Digital Transformation Journey!
 
View Orchestration from Model Driven Engineering Prospective
View Orchestration from Model Driven Engineering ProspectiveView Orchestration from Model Driven Engineering Prospective
View Orchestration from Model Driven Engineering Prospective
 
Data Beats Emotions – How DATEV Generates Business Value with Data-driven Dec...
Data Beats Emotions – How DATEV Generates Business Value with Data-driven Dec...Data Beats Emotions – How DATEV Generates Business Value with Data-driven Dec...
Data Beats Emotions – How DATEV Generates Business Value with Data-driven Dec...
 
Andersen-Portfolio-Latest Projects ENG
Andersen-Portfolio-Latest Projects ENGAndersen-Portfolio-Latest Projects ENG
Andersen-Portfolio-Latest Projects ENG
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Recently uploaded (20)

AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 

Stream Scaling in Pravega

  • 1. Stream scaling in Pravega Flavio Junqueira, Pravega - Dell EMC
  • 2. Profile: Flavio Junqueira • Director at Dell EMC • Lead the Pravega team • Background • Distributed computing • Research: Microsoft, Yahoo! • Worked on various Apache projects • E.g., Apache ZooKeeper, Apache BookKeeper Data Works - Barcelona, 2019 2
  • 3. Pravega • Pravega is • A stream store: stream is the storage primitive • The foundation is segments • Segments enable a flexible composition of streams • Segments enable stream scaling • Stream scaling • Streams adapt to changes to incoming workload • Changes the number of segments dynamically • Respects order Data Works - Barcelona, 2019 3
  • 4. Motivating stream scaling Data Works - Barcelona, 2019 4
  • 5. Data Works - Barcelona, 2019 Social networks Online shopping Streams ahoy! Stream of user events • Status updates • Online transactions 5
  • 6. Data Works - Barcelona, 2019 Social networks Online shopping Server monitoring Stream of user events • Status updates • Online transactions Stream of server events • CPU, memory, disk utilization Streams ahoy! 6
  • 7. Data Works - Barcelona, 2019 Social networks Online shopping Server monitoring Sensors (IoT) Stream of user events • Status updates • Online transactions Stream of server events • CPU, memory, disk utilization Stream of sensor events • Temperature samples • Samples from radar and image sensors in cars Streams ahoy! 7
  • 8. Changes to workload Data Works - Barcelona, 2019 8 Events Servers, Sensors, etc.
  • 9. Changes to the source Data Works - Barcelona, 2019 9 Events Servers, Sensors, etc.
  • 10. Workload cycles and spikes Data Works - Barcelona, 2019 11 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Seasonal spikes 0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 1:00 3:00 5:00 7:00 9:00 11:00 13:00 15:00 17:00 19:00 21:00 23:00 Daily cycles 0 2 4 6 8 10 12 14 Weekly cycles Unplanned
  • 11. Overprovisioning… We want to do better. Data Works - Barcelona, 2019 12
  • 12. Event processing Data Works - Barcelona, 2019 13 Processor 1Source Source emits 2 events/second Processor processes 3 events/second Append-only Log (segment in Pravega) Colors represent event keys
  • 13. Event processing Data Works - Barcelona, 2019 14 Source Processor processes 3 events/second Processor 1 Source emits 2 events/second Colors represent event keys Append-only Log (segment in Pravega)
  • 14. Event processing Data Works - Barcelona, 2019 15 Source ✓ Source rate increases ✓ New rate: 4 events/second ✓ Processor still processes 3 events/second ✓ Can’t keep up with the source rate Processor 1 Colors represent event keys Append-only Log (segment in Pravega)
  • 15. Event processing Data Works - Barcelona, 2019 16 Source ✓ Source rate increases ✓ New rate: 4 events/second Processor 1 Processor 2 ✓ Add a second processor ✓ Each processor processes 3 events/second ✓ Can keep up with the rate Colors represent event keys Append-only Log (segment in Pravega)
  • 16. Event processing Data Works - Barcelona, 2019 17 Source ✓ Source rate increases ✓ New rate: 4 events/second Processor 1 Append-only Log Processor 2 ✓ Add a second processor ✓ Each processor processes 3 events/second ✓ Can keep up with the rate Problem: Key order
  • 17. Event processing Data Works - Barcelona, 2019 18 Source ✓ Source rate increases ✓ New rate: 4 events/second Processor 1 Processor 2 ✓ Add a second processor ✓ Each processor processes 3 events/second ✓ Can keep up with the rate Problem: Key order e2 e1 ✓ e1 can be processed aftere2 Append-only Log (segment in Pravega)
  • 18. Event processing Data Works - Barcelona, 2019 19 Source Processor 1 Processor 2 ✓ Source rate increases ✓ New rate: 4 events/second ✓ Add a second processor ✓ Each processor processes 3 events/second ✓ Can keep up with the rate Split the input and add processors Append-only Log (segment in Pravega)
  • 19. Event processing Data Works - Barcelona, 2019 20 Source Processor 1 Processor 2 ✓ Source rate increases ✓ New rate: 4 events/second ✓ Add a second processor ✓ Each processor processes 3 events/second ✓ Can keep up with the rate Split the input and add processors Problem: Key order e1 e2 ✓ e1 can be processed after e2 Append-only Log (segment in Pravega)
  • 20. Event processing Data Works - Barcelona, 2019 21 Source Processor 1 Processor 2 ✓ Source rate increases ✓ New rate: 4 events/second ✓ Add a second processor ✓ Each processor processes 3 events/second ✓ Can keep up with the rate Split the input and add processors Processor 2 only starts once earlier events have been processed
  • 21. Data Works - Barcelona, 2019 22 Scaling in Pravega - Changes the number of segments dynamically - Triggered according to incoming traffic - Orders segments to prevent inconsistencies
  • 22. Scaling in Pravega Data Works - Barcelona, 2019 24
  • 23. Pravega • Storing data streams • Open source • Under active development http://pravega.io http://github.com/pravega/pravega 25Data Works - Barcelona, 2019
  • 24. Data Works - Barcelona, 2019 Time PresentRecent Past Distant Past Anatomy of a stream 26
  • 25. Data Works - Barcelona, 2019 Messaging Pub-sub Bulk store Time PresentRecent Past Distant Past Anatomy of a stream 27
  • 26. Data Works - Barcelona, 2019 Time PresentRecent Past Distant Past Anatomy of a stream 28 Pravega
  • 27. Data Works - Barcelona, 2019 Time PresentRecent Past Distant Past Anatomy of a stream Unbounded amount of data Ingestion rate might vary 29 Pravega
  • 28. Pravega and Streams ….. 01110110 01100001 01101100 ….. 01001010 01101111 01101001 Pravega 01000110 01110110 Append Read 01000110 01110110 Data Works - Barcelona, 2019 Ingest stream data Process stream data 31
  • 29. Pravega and Streams 01000110 01110110 Append Read Data Works - Barcelona, 2019 32 Event writer Event writer Event reader Event reader Group • Load balance • Grow and shrink Pravega Ingest stream data Process stream data
  • 30. Segments in Pravega Data Works - Barcelona, 2019 01000111 01110110 11000110 01000111 01110110 11000110 Pravega Stream Composition of Segment: • Stream unit • Append only • Sequence of bytes 33
  • 31. Parallelism Data Works - Barcelona, 2019 34
  • 32. Segments in Pravega Pravega 01000110 01110110 Segments Append Read 01101111 01101001 Segments • Segments are sequences of bytes • Use routing keys to determine segment Data Works - Barcelona, 2019 〈key, 01101001 〉 Routing key 35 Event writer Event writer Event reader Event reader Event reader
  • 33. Segments can be sealed Data Works - Barcelona, 2019 36
  • 34. Segments in Pravega Pravega 01000110 01110110 Segments Append Read 01101111 01101001 Segments Once sealed, a segment can’t be appended to any longer. Data Works - Barcelona, 2019 37 Event writer Event writer Event reader Event reader Event reader
  • 35. How is sealing segments useful? Data Works - Barcelona, 2019 38
  • 37. Segments in Pravega 01000110 Segments Segments 01101111 01000110 01000110 01000110 01101111 01101111 01101111 01101111 01000110 01000110 0110111101101111 01000110 01101111 Stream Compose to form a stream • Each segment can live in a different server • Not limited by the capacity of a single server • Unbounded streams Data Works - Barcelona, 2019 00101111 01101001 40 Pravega
  • 39. Stream scaling Data Works - Barcelona, 2019 42
  • 40. 01000110 Scaling a stream ….. 01110110 01100001 01101100 01000110 • Stream has one segment 1 ….. 01110110 01100001 01101100 • Seal current segment • Create new ones 2 01000110 01000110 • Follows write workload • Say input load has increased • Need more parallelism • Auto or manual scaling Data Works - Barcelona, 2019 43
  • 42. Routing key space 0.0 1.0 Time Split 0.5 Segment 1 Segment 2 Segment 3 t0 Data Works - Barcelona, 2019 45 Hot keys t1
  • 43. Routing key space 0.0 1.0 Time Split 0.5 Segment 1 Segment 2 Segment 3 t0 Data Works - Barcelona, 2019 46 Location 1 t1 Location 2 - Keys are coordinates in a geo application - E.g., taxi rides
  • 44. Routing key space 0.0 1.0 Time Split Split 0.5 0.75 Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 t0 t1 t2 Data Works - Barcelona, 2019 47 Location 1 Location 2 - Keys are coordinates in a geo application - E.g., taxi rides
  • 45. Routing key space 0.0 1.0 Time Split Split Merge 0.5 0.75 Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 t0 t1 t2 Data Works - Barcelona, 2019 48 Back to cold
  • 46. Routing key space 0.0 1.0 Time 0.5 0.75 Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 t0 t1 t2 Key ranges are not statically assigned to segments Data Works - Barcelona, 2019 49 Split Split Merge 0.9 maps to Segment 1 0.9 maps to Segment 2 0.9 maps to Segment 4 0.9 maps to Segment 6
  • 47. Data Works - Barcelona, 2019 50
  • 48. Daily Cycles Peak rate is 10x higher than lowest rate 4:00 AM 9:00 AM NYC Yellow Taxi Trip Records, March 2015 http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
  • 50. How do I control scaling? Data Works - Barcelona, 2019 55
  • 51. Scaling policies • Configured on a per stream basis • Specifies a policy for the stream • Policies • Fixed • Set of segments is fixed • Bytes per second • Scales up and down according to volume of data • Target data rate • Events per second • Scales up and down according to volume of events • Target event rate Data Works - Barcelona, 2019 56
  • 52. Auto-Scaling: Triggering a scaling event • By byte and event rates • Target T per segment • Reports every 2 minutes ✓ 2-min rate (2M) ✓ 5-min rate (5M) ✓ 10-min rate (10M) ✓ 20-min rate (20M) Data Works - Barcelona, 2019 57 Scale up x x + 2 min x + 4 min x + 6 min time • Scaling down ∧ 2M, 5M, 10M < T ∧ 20M < T / 2 2M = 60 5M = 56 10M = 46 T = 50 2M = 60 5M = 60 10M = 48 T = 50 2M = 60 5M = 60 10M = 5 T = 50 2M = 60 5M = 60 10M = 52 T = 50 Scale down x x + 2 min x + 4 min x + 6 min time 2M = 20 5M = 20 10M = 20 20M = 27 T = 50 • Scaling up ∨ 2M > 5 x T ∨ 5M > 2 x T ∨ 10M > T 2M = 20 5M = 20 10M = 20 20M = 26 T = 50 2M = 20 5M = 20 10M = 20 20M = 25 T = 50 2M = 20 5M = 20 10M = 20 20M = 24 T = 50
  • 53. Auto-scaling: Internals Data Works - Barcelona, 2019 58 Segment store 010101010 Segment Stats recorder Auto-scale processor Auto-scale events Controller Append Event reader
  • 54. Read order Data Works - Barcelona, 2019 59
  • 55. Reader groups • Group of event readers • Read events from a set of streams • Load distributed across readers of the group • Segments • A given reader reads from a set of segments • Coordination of segment assignment done via a state synchronizer • State synchronizer • General facility for synchronizing state across processes • Uses a revisioned Pravega segment Data Works - Barcelona, 2019 60
  • 56. Reader groups + Scaling Pravega Segment 2 Segment 1 Reader Reader 1 Pravega Segment 2 Segment 1 Reader Reader 2 Segment 3 Segment 4 Scale up! Data Works - Barcelona, 2019 61
  • 57. Reader groups + Scaling Pravega Segment 2 Segment 1 Reader Reader 3 Segment 3 Segment 4 • Hit end of segment • Get successors • Update reader group state Pravega Reader Reader 4 Segment 4 Segment 2 Segment 3 Pravega Reader {3} Reader {2, 4} 5 Segment 4 Segment 2 Segment 3 Data Works - Barcelona, 2019 62
  • 58. Building pipelines – Scaling downstream Data Works - Barcelona, 2019 63
  • 59. Scaling pipelines Data Works - Barcelona, 2019 64 Stage 1 Stage 2Source All stages can handle the load induced by the source
  • 60. Scaling pipelines Data Works - Barcelona, 2019 65 Scaled Stage 1 Stage 2Big source Stage 2 can’t cope with the load change Load coming from source increases Stage 1 scales and adapts to the load change
  • 61. Scaling signals Data Works - Barcelona, 2019 66 Pravega AppBig source • Pravega won’t scale the application
  • 62. Scaling signals Data Works - Barcelona, 2019 67 Pravega AppBig source • Pravega won’t scale the application downstream • … but it can signal • E.g., more segments • E.g., number of unread bytes is growing Signals from Pravega
  • 63. When to scale… Data Works - Barcelona, 2019 68
  • 64. When to scale 1. Input rate has changed • Higher volume of data coming in 2. Application needs more capacity • Processing rate is lower compared to input rate • No change to input rate necessarily Data Works - Barcelona, 2019 69
  • 65. Changes to input rate Data Works - Barcelona, 2019 70 Pravega Source Source Map Map • Stream processing job • Say an Apache Flink job Reader Reader Reduce Reduce
  • 66. Changes to input rate Data Works - Barcelona, 2019 71 Pravega Source Source Task Task • Stream processing job • Say an Apache Flink job Source Map • More capacity to avoid lagging behind • Additional tasks • Additional Pravega readers Reader Reader Reader Map Map Reduce Reduce
  • 67. Processing rate not sufficient Data Works - Barcelona, 2019 72 Pravega Source Source Task Task • Stream processing job • Say an Apache Flink job Map • Application lags behind • Additional task Reader Reader Map Map Reduce Reduce
  • 68. Reader group: listener and metrics • Listener API • Register a listener to react to changes • E.g., changes to the number of segments • Metrics • Reports specific values of interest • E.g., number of unread bytes in a stream Data Works - Barcelona, 2019 74
  • 69. Example: Pravega Flink connector Data Works - Barcelona, 2019 75 public class ReaderOperatorRescalingPolicy implements OperatorRescalingPolicy { … @Override public int rescaleTo(OperatorRescalingContext operatorRescalingContext) { return currentNumberOfSegments; } private class ListenerImpl implements Listener<SegmentNotification> { @Override public void onNotification(SegmentNotification notification) { currentNumberOfSegments = notification.getNumOfSegments(); } } … • Connects Flink and Pravega • Pravega can be source and sink • On Apache Flink • Signals from the source can trigger dynamic scaling • E.g., increase the number of readers
  • 70. Demo (Thanks to Till Rohrmann) Data Works - Barcelona, 2019 76
  • 71. Demo Topology 77 Pravega Source Sink FILE.out • Executed on Yarn to support dynamic resource allocation time Event rate Data Works - Barcelona, 2019
  • 72. Wrap Up Data Works - Barcelona, 2019 78
  • 73. Wrap up • Pravega • Stream store • Scalable ingestion of continuously generated data • Stream scaling • Stream data pipelines • Signaling for dynamic scaling downstream • Proof of concept with Apache Flink Data Works - Barcelona, 2019 79
  • 74. Data Works - Barcelona, 2019 80 Questions? http://pravega.io http://github.com/pravega/pravega http://flink.apache.org http://github.com/pravega/flink-connectors https://github.com/tillrohrmann/flink/tree/rescalingPolicy E-mail: fpj@pravega.io Twitter: @fpjunqueira Pravega’s Web site Pravega’s code Apache Flink’s site Pravega-Flink connector Flink dynamic scaling PoC