Streaming Auto-scaling in Google Cloud Dataflow

Software Engineer
Google
Manuel Fahndrich
Streaming Auto-Scaling in
Google Cloud Dataflow

InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/google-cloud-dataflow

Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com

https://commons.wikimedia.org/wiki/File:Globe_centered_in_the_Atlantic_Ocean_(green_and_grey_globe_scheme).svg
Addictive
Mobile
Game

1,251,965
1,019,341
989,673
151,365
109,903
98,736
Team RankingIndividual Ranking
Sarah
Joe
Milo
Hourly Ranking Daily Ranking

An Unbounded Stream of Game Events
9:008:00 14:0013:0012:0011:0010:002:001:00 7:006:005:004:003:00

… with unknown delays.
9:008:00 14:0013:0012:0011:0010:008:00
8:00
8:00
8:00

The Resource Allocation Problem
time
workload
over-provisioned
resources
time
workload
under-provisioned
resources
resources
resources

Matching Resources to Workload
time
workload
auto-tuned
resources
resources

Resources = Parallelism
time
workload
auto-tuned
parallelism
parallelism
More generally: VMs (including CPU, RAM, network, IO).

Assumptions
Big Data Problem
Embarrassingly Parallel
Scaling VMs ==> Scales Throughput
Horizontal Scaling

Agenda
Streaming Dataflow Pipelines
Pipeline Execution
Adjusting Parallelism Automatically
Summary + Future Work
1
2
3
4

20122002 2004 2006 2008 2010
MapReduce
GFS Big Table
Dremel
Pregel
FlumeJava
Colossus
Spanner
2014
MillWheel
Dataflow
2016
Google’s Data-Related Systems

Google Dataflow SDK
Open Source SDK used to construct a Dataflow pipeline.
(Now Incubating as Apache Beam)

Computing Team Scores
// Collection of raw log lines
PCollection<String> raw = ...;
// Element-wise transformation into team/score
// pairs
PCollection<KV<String, Integer>> input =
raw.apply(ParDo.of(new ParseFn()))
// Composite transformation containing an
// aggregation
PCollection<KV<String, Integer>> output = input
.apply(Window.into(FixedWindows.of(Minutes(60))))
.apply(Sum.integersPerKey());

● Given code in Dataflow (incubating as Apache Beam)
SDK...
● Pipelines can run…
○ On your development machine
○ On the Dataflow Service on Google Cloud Platform
○ On third party environments like Spark or Flink.

Cloud Dataflow
A fully-managed cloud service and
programming model for batch and
streaming big data processing.

Optimize
Schedule
GCS GCS

time
workload
auto-tuned
parallelism
parallelism
Back to the Problem at Hand

Signals measuring Workload
Policy making Decisions
Mechanism actuating Change
Auto-Tuning Ingredients

S0
S2
S1
Optimized Pipeline = DAG of Stages
raw input
Individual
points
team
points

S0
S2
S1
Stage Throughput Measure
raw input
Individual
points
team
points
throughput
throughput
throughput

Picture by Alexandre Duret-Lutz, Creative Commons 2.0 Generic

S0
S2
S1
Queues of Data Ready for Processing
Queue Size = Backlog

vs.
Backlog Growth
Backlog Size

Backlog Growth
=
Processing Deficit

S1
Derived Signal: Stage Input Rate
throughput
Input Rate = Throughput + Backlog Growth
backlog growth

Constant Backlog...
...could be bad

Backlog Time =
Backlog Size
Throughput

Backlog Time =
Time to get through backlog

Bad Backlog = Long Backlog Time

Backlog Growth
and
Backlog Time
Inform Upscaling.
What Signals indicate
Downscaling?

Throughput
Backlog growth
Backlog time
CPU utilization
Signals Summary

Goals:
1. No backlog growth
2. Short backlog time
3. Reasonable CPU utilization
Policy: making Decisions

Upscaling Policy: Keeping Up
Given M machines
For a stage, given:
average stage throughput T
average positive backlog growth G of stage
Machines needed for stage to keep up:
(T + G)
T
M’ = M

Upscaling Policy: Catching Up
Given M machines
Given R (time to reduce backlog)
For a stage, given:
average backlog time B
Extra machines to remove backlog:
B
R
Extra = M

Upscaling Policy: All Stages
Want all stages to:
1. keep up
2. have log backlog time
Pick Maximum over all stages of M’ + Extra

Example (signals)
input rate
throughput
backlog
growth
backlog
time
MB/s
seconds

Example (policy)
M’
M
Extra R=60s
machines

Example (policy)
M’
M
machines
Extra R=60s

Preconditions for Downscaling
Low backlog time
No backlog growth
Low CPU utilization

How far can we
downscale?
Stay tuned...

Adjusting Parallelism of a
Running Streaming Pipeline
Mechanism: actuating Change
3

S0
S2
S1

S0
S2
S1
Machine 0

Adding Parallelism
Machine 0
S0
S2
S1
S0
S2
S1
S0
S2
S1

Adding Parallelism
S0
S2
S1
S0
S2
S1
Machine 0
Machine 1

Adding Parallelism = Splitting Key Ranges
S0
S2
S1
S0
S2
S1
Machine 0
Machine 1

Adding Parallelism = Migrating Computation Ranges
S0
S2
S1
S0
S2
S1
Machine 0
Machine 1

Checkpoint and Recovery
~
Computation Migration

Key Ranges and Persistence
S0
S2
S1
Machine 0
S0
S2
S1
Machine 1
S0
S2
S1
Machine 3
S0
S2
S1
Machine 2

Downscaling from 4 to 2 Machines
S0
S2
S1
Machine 0
S0
S2
S1
Machine 1
S0
S2
S1
Machine 3
S0
S2
S1
Machine 2

S0
S2
S1
Machine 0
S0
S2
S1
Machine 1

S0
S2
S1
Machine 1
S0
S2
S1
Machine 2
Upsizing = Steps in Reverse

Granularity of Parallelism
As of March 2016, Google Cloud Dataflow:
• Splits Key Ranges initially Based on Max Machines
• At Max: 1 Logical Persistent Disk per Machine
Each disk has slice of key ranges from all stages
• Only (relatively) even Disk Distributions
• Results in Scaling Quanta

Parallelism Disk per Machine
3 N/A
4 15
5 12
6 10
7 8, 9
8 7, 8
9 6, 7
10 6
12 5
15 4
20 3
30 2
60 1
Example Scaling
Quanta:
Max = 60 Machines

Next lower scaling quanta => M’ machines
Estimate future CPUM’ per machine:
If new CPUM’ < threshold (say 90%),
downscale to M’
Downscaling Policy
M
M’
CPUM’ = CPUM

Auto-Scaling Summary
Signals: throughput, backlog time,
backlog growth, CPU utilization
Policy: keep up, reduce backlog,
use CPUs
Mechanism: split key ranges,
migrate computations

• Experiment with non-uniform disk distributions to
address hot ranges
• Dynamically splitting ranges finer than initially done.
• Approximate model of #VM - throughput relation
Future Work

Questions?
Further reading on streaming model:
The world beyond batch: Streaming 101
The world beyond batch: Streaming 102

Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations/
google-cloud-dataflow

Streaming Auto-scaling in Google Cloud Dataflow

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Streaming Auto-scaling in Google Cloud Dataflow

Similar to Streaming Auto-scaling in Google Cloud Dataflow (20)

More from C4Media

More from C4Media (20)

Recently uploaded

Recently uploaded (20)

Streaming Auto-scaling in Google Cloud Dataflow