SlideShare a Scribd company logo
Autopiloting #realtime
processing in Heron
Karthik Ramasamy (@karthikz), Streamlio

Joint work with
Avrillia Floratau (Microsoft), Ashvin Agarwal (Microsoft)
Bill Graham (Twitter), and Sriram Rao (Microsoft)
What is auto piloting?
According to Wikipedia
An autopilot is a system used to control the trajectory of an “aircraft” without constant
‘hands-on’ control by a human operator being required

Tweaking it a li3le bit…
Auto piloting a real time system refers to its ability to adapt itself as their environmental
conditions change without constant ‘hands-on’ control by a human operator and
continue to produce results
2
Why?
3
Value&of&Data&to&Decision/Making&
Time&
Preven8ve/&
Predic8ve&
Ac8onable&
Reac8ve&
Historical&
Real%&
Time&
Seconds& Minutes& Hours& Days&
Tradi8onal&“Batch”&&&&&&&&&&&&&&&
Business&&Intelligence&
Informa9on&Half%Life&
In&Decision%Making&
Months&
Time/cri8cal&
Decisions&
Why?
4
G
Impact of downtime
during popular events
such as Super Bowl

Oscars, etc
Ü
Impact of not
honoring an SLA
leading to penalty
payments
!
Engineers & SRE burn
out attending to
incidents
increased productivityloss of revenue sla violations quality of life
With reduced
incidents, engineers
can focus on actual
development
s
Twitter Heron
5
Twitter Heron
Streaming platform for processing real time data as it arrives, 

so you can react to data as it happens.
6
Guaranteed
Message
Passing
Horizontal
Scalability
Robust
Fault
Tolerance
Concise
Code-Focus
on Logic
b  Ñ /
Heron Terminology
7
Topology
Directed acyclic graph 

vertices = computation, and 

edges = streams of data tuples
Spouts
Sources of data tuples for the topology

Examples - Pulsar/Kafka/MySQL/Postgres
Bolts
Process incoming tuples, and emit
outgoing tuples

Examples - filtering/aggregation/join/
any function
,
%
Heron Topology
8
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
Heron Topology - Physical Execution
9
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
Heron Groupings
10
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,
Heron Topology - Physical Execution
11
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
Shuffle Grouping
Shuffle Grouping
Fields Grouping
Fields Grouping
Fields Grouping
Fields Grouping
Writing Heron Topologies
12
Procedural - Low Level API
Directly write your spouts
and bolts
Functional - Mid Level API
Use of maps, flat maps, transform,
windows
Declarative - SQL (coming)
Use of declarative language - specify
what you want, system will figure it
out.
,
%
Heron Architecture
13
Topology 1
Topology
Submission
Scheduler
Topology 2
Topology N
Heron Topology Components
14
Topology Master
ZooKeeper

Cluster
Stream 

Manager
I1 I2 I3 I4
Stream 

Manager
I1 I2 I3 I4
Logical Plan, 

Physical Plan and 

Execution State
Sync Physical Plan
DATA CONTAINER DATA CONTAINER
Metrics 

Manager
Metrics 

Manager
MASTER 

CONTAINER
Heron Backpressure
15
% %
S1 B2 B3
%
B4
Stream Manager
16
S1 B2
B3
Stream 

Manager
Stream 

Manager
Stream 

Manager
Stream 

Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
Spout Backpressure
S1 S1
S1S1S1 S1
S1S1
17
B2
B3
Stream 

Manager
Stream 

Manager
Stream 

Manager
Stream 

Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4
Heron @Twitter
18
> 500 Real
Time Jobs
500 Billion Events/Day

PROCESSED
10 - 50 ms

latency
Heron Sample Topologies
19
Heron Visualization
20
Solving Common
Issues
21
Developer Issues
22
01 02
Container Resource
Allocation
Parallelism
Tuning
/
.
Operational Issues
23
01 02 03
Slow Hosts Network Issues Data Skew
/ .
-
04
Load Variations
,
05
SLA Violations
/
Slow Hosts
24
Memory Parity Errors
Impeding Disk Failures
Lower GHZ
G
g
Network
25
Network Slowness
Network Partitioning
G
Network Slowness
26
Delays processing Data accumulates Timeliness of
results is affected
I
Network Partitioning
27
Stream 

Manager
Topology

Master
Topology

Master
Scheduler
Stream 

Manager
Stream 

Manager Scheduler
Stream 

Manager
Network Partitioning
28
New Master Container
Acquiring Mastership
in ZooKeeper fails
Master Container Dies
G
g
Topology

Master
Scheduler
Network Partitioning
29
TMaster thinks
data container failed
Waits for the scheduler
to reschedule new data
container
Never happens
G
g
Stream 

Manager
Topology

Master
Network Partitioning
30
Cannot exchange data
Data accumulates
Chaos ensues!
G
g
Stream 

Manager
Stream 

Manager
Network Partitioning
31
New data container
spawned
TMaster realizes two
data containers report
as the same
Does not accept
the new one and eventually
it dies
G
g
Scheduler
Stream 

Manager
Data Skew
32
Multiple Keys
Several keys map into
single instance and their
count is high
Single Key
Single key maps into a
instance and its count is high
H
C
Data Skew - Multiple Keys
33
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
%%
Data Skew - Single Key
34
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
%%%
What happens if the skew is temporary?
Load Variations
35
Spikes
Sudden surge of data -
short lived vs last for
several minutes
Daily Patterns
Predictable change in traffic
H
C
Autopiloting
36
Auto Piloting Heron
37
Maintenance of SLOs in the face of
unpredictable load variations and hardware
or software performance degradation
Manual, time-consuming and error-prone
task of tuning various systems knobs to
achieve SLOs
Auto Piloting Streaming Systems
Autopiloting Streaming Systems
38
Self tuning Self stabilizing Self healing
Gg
Several tuning knobs
Time consuming tuning phase
The system should take as
input an SLO and
automatically configure the
knobs.
The system should react to
external shocks and
automatically reconfigure
itself
Stream jobs are long running
Load variations are common
The system should identify
internal faults and attempt
to recover from them
System performance affected
by hardware or software
delivering degraded quality of
service
Enter Dhalion
39
Dhalion periodically executes well-
specified policies that optimize
execution based on some objective.
We created policies that dynamically
provision resources in the presence of
load variations

and auto-tune streaming applications
so that a throughput SLO is met.
Dhalion is a policy based framework
integrated into Heron
Dhalion Policy Phases
40
Symptom 

Detector 1
Symptom 

Detector 2
Symptom 

Detector 3
Symptom 

Detector N
....
Diagnoser 1
Diagnoser 2
Diagnoser M
....
Resolver

Invocation
Diagnosis
1
Diagnosis 2
Diagnosis
M
Symptom 1
Symptom 2
Symptom 3
Symptom N
Symptom Detection Diagnosis Generation Resolution
Resolver 1
Resolver 2
Resolver M
....
Resolver
Selection
Metrics
Incorporating Dhalion into Heron
41
S1 B2
B3
Stream 

Manager
Stream 

ManagerS1 B2
B3 B4
B4
Topology 

Master
Health 

Manager
Metrics 

Manager
Metrics 

Manager
Action 

Log
Action 

Blacklist
The Health Manager periodically
executes Dhalion policies that
maintain the health of the
topology.
The Action Log maintains a list of
actions taken by the policy and
the corresponding diagnosis.
The Action Blacklist contains a
list of diagnosis descriptions and
corresponding actions taken that
did not produce the expected
outcome.
Dynamic Resource Provisioning
42
Policy
This policy reacts to unexpected

load variations (workload spikes)
Goal
Goal is to scale up and scale
down the topology resources as
needed - while keeping the
topology in a steady state where
back pressure is not observed
H
C
Dynamic Resource Provisioning
43
Pending Tuples
Detector
Backpressure
Detector
Processing Rate
Skew Detector
Resource Over
provisioning
Diagnoser
Resource Under
Provisioning
Diagnoser
Data Skew
Diagnoser
Resolver

Invocation
Diagnosis
Symptoms
Symptom Detection Diagnosis Generation Resolution
Metrics
Slow Instances
Diagnoser
Bolt Scale 

Down Resolver
Bolt Scale 

Up Resolver
Data Skew

Resolver
Restart Instances

Resolver
Dynamic Resource Provisioning -
Steady State
44
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Bolt
Splitter Bolt Counter Bolt
Counter Bolt
100 | 20
100 | 20
processing rate (tps) | queue size (#tuples)
Dynamic Resource Provisioning -
Under Provisioned
45
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Bolt
Splitter Bolt Counter Bolt
Counter Bolt
150 | 80
150 | 80
processing rate (tps) | queue size (#tuples)
Dynamic Resource Provisioning -
Steady State
46
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Bolt
Splitter Bolt Counter Bolt
Counter Bolt
100 | 20
100 | 20
processing rate (tps) | queue size (#tuples)
Dynamic Resource Provisioning -
Slow Instance
47
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Bolt
Splitter Bolt Counter Bolt
Counter Bolt
50 | 05
50 | 80
processing rate (tps) | queue size (#tuples)
Dynamic Resource Provisioning -
Steady State
48
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Bolt
Splitter Bolt Counter Bolt
Counter Bolt
100 | 20
100 | 20
processing rate (tps) | queue size (#tuples)
Dynamic Resource Provisioning - Data
Skew
49
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Bolt
Splitter Bolt Counter Bolt
Counter Bolt
50 | 05
150 | 80
processing rate (tps) | queue size (#tuples)
Satisfying Throughput SLOs
50
Policy
Automatically tunes the number
of spouts and bolts so that the
SLO is met
Goal
To eliminate the tuning
phase. User submits a
topology with parallelism 1
a t a l l s t a g e s a n d a
throughput SLO
H
C
Mechanisms
Policy reuses the basic mechanism
of the dynamic resource
provisioning policy. Employs
additional mechanism to
H
Experimental Setup
51
% %
Spout Splitter Bolt Counter Bolt
Shuffle Grouping Fields Grouping
Microsoft HDInsight

Intel Xeon ES-2673 CPU@2.40
GHz

28 GB of Memory
Throughput of Spouts (No. Of
tuples emitted over 1 min)

Throughput of Bolts (No. of
tuples emitted over 1 min)

Number of Heron Instances
provisioned
Hardware and Software Configuration Evaluation Metrics
Dynamic Resource Provisioning
52
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0 10 20 30 40 50 60 70 80 90 100 110 120
Normalized	Throughput
Time	(in	minutes)
Spout Splitter	Bolt Counter	Bolt
Scale	
Down	
Scale	Up	
S1
S2
S3
The Dynamic Resource
Provisioning Policy is able to
adjust the topology resources
on-the-fly when workload
spikes occur.
The policy can correctly detect
and resolve bottlenecks even on
multi-stage topologies where
backpressure is gradually
propagated from one stage of the
topology to another.
Dynamic Resource Provisioning
53
0
5
10
15
0 20 40 60 80 100 120
Number	of	Bolts
Time	(in	minutes)
Splitter	Bolt Counter	Bolt
Heron Instances are gradually
scaled up and down
according to the input load
Conclusion
54
"
Auto piloting is important in Streaming systems
Key issues - Tuning, slow hosts, network and data skew
Dhalion provides a framework to tackle these using specific policiesG

More Related Content

Similar to Autopiloting Realtime Processing in Heron

Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
Neil Avery
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
KafkaZone
 
ThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryThoughtWorks Continuous Delivery
ThoughtWorks Continuous Delivery
Kyle Hodgson
 
Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...
Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...
Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...
Carl Starendal
 
Asset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, BistelAsset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, Bistel
Metatron
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
Sandesh Rao
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
QAware GmbH
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
Maycon Viana Bordin
 
Soa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng crSoa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng cr
Vasily Demin
 
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
Rosemary Wang
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
WMG centre High Value Manufacturing Catapult
 
Puppet Camp Melbourne: Keynote
Puppet Camp Melbourne: KeynotePuppet Camp Melbourne: Keynote
Puppet Camp Melbourne: Keynote
Puppet
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
markgrover
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
C4Media
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
confluent
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
Neil Avery
 
1120 rao mathew
1120 rao mathew1120 rao mathew
1120 rao mathew
Rising Media, Inc.
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
Databricks
 

Similar to Autopiloting Realtime Processing in Heron (20)

Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
 
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
Abstractions for managed stream processing platform (Arya Ketan - Flipkart)
 
ThoughtWorks Continuous Delivery
ThoughtWorks Continuous DeliveryThoughtWorks Continuous Delivery
ThoughtWorks Continuous Delivery
 
Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...
Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...
Probabilistic Forecasting — A Data Science Approach to Feature Delivery Forec...
 
Asset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, BistelAsset performance management using Druid by Eric Lim, Bistel
Asset performance management using Druid by Eric Lim, Bistel
 
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
The Machine Learning behind the Autonomous Database   ILOUG Feb 2020 The Machine Learning behind the Autonomous Database   ILOUG Feb 2020
The Machine Learning behind the Autonomous Database ILOUG Feb 2020
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience ReportMaking Runtime Data Useful for Incident Diagnosis: An Experience Report
Making Runtime Data Useful for Incident Diagnosis: An Experience Report
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Soa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng crSoa12c launch 5 event processing shmakov eng cr
Soa12c launch 5 event processing shmakov eng cr
 
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
ThoughtWorks Tech Talks NYC: DevOops, 10 Ops Things You Might Have Forgotten ...
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
 
Puppet Camp Melbourne: Keynote
Puppet Camp Melbourne: KeynotePuppet Camp Melbourne: Keynote
Puppet Camp Melbourne: Keynote
 
Near real-time anomaly detection at Lyft
Near real-time anomaly detection at LyftNear real-time anomaly detection at Lyft
Near real-time anomaly detection at Lyft
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...The art of the event streaming application: streams, stream processors and sc...
The art of the event streaming application: streams, stream processors and sc...
 
Kafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming appKafka summit SF 2019 - the art of the event-streaming app
Kafka summit SF 2019 - the art of the event-streaming app
 
1120 rao mathew
1120 rao mathew1120 rao mathew
1120 rao mathew
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 

More from Streamlio

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
Streamlio
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
Streamlio
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
Streamlio
 
Strata London 2018: Multi-everything with Apache Pulsar
Strata London 2018:  Multi-everything with Apache PulsarStrata London 2018:  Multi-everything with Apache Pulsar
Strata London 2018: Multi-everything with Apache Pulsar
Streamlio
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
Streamlio
 
Event Data Processing with Streamlio
Event Data Processing with StreamlioEvent Data Processing with Streamlio
Event Data Processing with Streamlio
Streamlio
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
Streamlio
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
Streamlio
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
Streamlio
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
Streamlio
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
Streamlio
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Streamlio
 

More from Streamlio (12)

Infinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache PulsarInfinite Topic Backlogs with Apache Pulsar
Infinite Topic Backlogs with Apache Pulsar
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 
Streamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache PulsarStreamlio and IoT analytics with Apache Pulsar
Streamlio and IoT analytics with Apache Pulsar
 
Strata London 2018: Multi-everything with Apache Pulsar
Strata London 2018:  Multi-everything with Apache PulsarStrata London 2018:  Multi-everything with Apache Pulsar
Strata London 2018: Multi-everything with Apache Pulsar
 
Introduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed StorageIntroduction to Apache BookKeeper Distributed Storage
Introduction to Apache BookKeeper Distributed Storage
 
Event Data Processing with Streamlio
Event Data Processing with StreamlioEvent Data Processing with Streamlio
Event Data Processing with Streamlio
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
 
Building data-driven microservices
Building data-driven microservicesBuilding data-driven microservices
Building data-driven microservices
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
 
Evaluating Streaming Data Solutions
Evaluating Streaming Data SolutionsEvaluating Streaming Data Solutions
Evaluating Streaming Data Solutions
 
Introduction to Apache Heron
Introduction to Apache HeronIntroduction to Apache Heron
Introduction to Apache Heron
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 

Autopiloting Realtime Processing in Heron

  • 1. Autopiloting #realtime processing in Heron Karthik Ramasamy (@karthikz), Streamlio Joint work with Avrillia Floratau (Microsoft), Ashvin Agarwal (Microsoft) Bill Graham (Twitter), and Sriram Rao (Microsoft)
  • 2. What is auto piloting? According to Wikipedia An autopilot is a system used to control the trajectory of an “aircraft” without constant ‘hands-on’ control by a human operator being required Tweaking it a li3le bit… Auto piloting a real time system refers to its ability to adapt itself as their environmental conditions change without constant ‘hands-on’ control by a human operator and continue to produce results 2
  • 3. Why? 3 Value&of&Data&to&Decision/Making& Time& Preven8ve/& Predic8ve& Ac8onable& Reac8ve& Historical& Real%& Time& Seconds& Minutes& Hours& Days& Tradi8onal&“Batch”&&&&&&&&&&&&&&& Business&&Intelligence& Informa9on&Half%Life& In&Decision%Making& Months& Time/cri8cal& Decisions&
  • 4. Why? 4 G Impact of downtime during popular events such as Super Bowl Oscars, etc Ü Impact of not honoring an SLA leading to penalty payments ! Engineers & SRE burn out attending to incidents increased productivityloss of revenue sla violations quality of life With reduced incidents, engineers can focus on actual development s
  • 6. Twitter Heron Streaming platform for processing real time data as it arrives, so you can react to data as it happens. 6 Guaranteed Message Passing Horizontal Scalability Robust Fault Tolerance Concise Code-Focus on Logic b Ñ /
  • 7. Heron Terminology 7 Topology Directed acyclic graph vertices = computation, and edges = streams of data tuples Spouts Sources of data tuples for the topology Examples - Pulsar/Kafka/MySQL/Postgres Bolts Process incoming tuples, and emit outgoing tuples Examples - filtering/aggregation/join/ any function , %
  • 8. Heron Topology 8 % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 9. Heron Topology - Physical Execution 9 % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %%
  • 10. Heron Groupings 10 01 02 03 04 Shuffle Grouping Random distribution of tuples Fields Grouping Group tuples by a field or multiple fields All Grouping Replicates tuples to all tasks Global Grouping Send the entire stream to one task / . - ,
  • 11. Heron Topology - Physical Execution 11 % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %% Shuffle Grouping Shuffle Grouping Fields Grouping Fields Grouping Fields Grouping Fields Grouping
  • 12. Writing Heron Topologies 12 Procedural - Low Level API Directly write your spouts and bolts Functional - Mid Level API Use of maps, flat maps, transform, windows Declarative - SQL (coming) Use of declarative language - specify what you want, system will figure it out. , %
  • 14. Heron Topology Components 14 Topology Master ZooKeeper Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan DATA CONTAINER DATA CONTAINER Metrics Manager Metrics Manager MASTER CONTAINER
  • 16. Stream Manager 16 S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  • 17. Spout Backpressure S1 S1 S1S1S1 S1 S1S1 17 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager B2 B3 B4 B2 B3 B2 B3 B4 B4
  • 18. Heron @Twitter 18 > 500 Real Time Jobs 500 Billion Events/Day PROCESSED 10 - 50 ms latency
  • 22. Developer Issues 22 01 02 Container Resource Allocation Parallelism Tuning / .
  • 23. Operational Issues 23 01 02 03 Slow Hosts Network Issues Data Skew / . - 04 Load Variations , 05 SLA Violations /
  • 24. Slow Hosts 24 Memory Parity Errors Impeding Disk Failures Lower GHZ G g
  • 26. Network Slowness 26 Delays processing Data accumulates Timeliness of results is affected I
  • 28. Network Partitioning 28 New Master Container Acquiring Mastership in ZooKeeper fails Master Container Dies G g Topology Master Scheduler
  • 29. Network Partitioning 29 TMaster thinks data container failed Waits for the scheduler to reschedule new data container Never happens G g Stream Manager Topology Master
  • 30. Network Partitioning 30 Cannot exchange data Data accumulates Chaos ensues! G g Stream Manager Stream Manager
  • 31. Network Partitioning 31 New data container spawned TMaster realizes two data containers report as the same Does not accept the new one and eventually it dies G g Scheduler Stream Manager
  • 32. Data Skew 32 Multiple Keys Several keys map into single instance and their count is high Single Key Single key maps into a instance and its count is high H C
  • 33. Data Skew - Multiple Keys 33 % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %% %%
  • 34. Data Skew - Single Key 34 % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %% %%% What happens if the skew is temporary?
  • 35. Load Variations 35 Spikes Sudden surge of data - short lived vs last for several minutes Daily Patterns Predictable change in traffic H C
  • 37. Auto Piloting Heron 37 Maintenance of SLOs in the face of unpredictable load variations and hardware or software performance degradation Manual, time-consuming and error-prone task of tuning various systems knobs to achieve SLOs Auto Piloting Streaming Systems
  • 38. Autopiloting Streaming Systems 38 Self tuning Self stabilizing Self healing Gg Several tuning knobs Time consuming tuning phase The system should take as input an SLO and automatically configure the knobs. The system should react to external shocks and automatically reconfigure itself Stream jobs are long running Load variations are common The system should identify internal faults and attempt to recover from them System performance affected by hardware or software delivering degraded quality of service
  • 39. Enter Dhalion 39 Dhalion periodically executes well- specified policies that optimize execution based on some objective. We created policies that dynamically provision resources in the presence of load variations and auto-tune streaming applications so that a throughput SLO is met. Dhalion is a policy based framework integrated into Heron
  • 40. Dhalion Policy Phases 40 Symptom Detector 1 Symptom Detector 2 Symptom Detector 3 Symptom Detector N .... Diagnoser 1 Diagnoser 2 Diagnoser M .... Resolver Invocation Diagnosis 1 Diagnosis 2 Diagnosis M Symptom 1 Symptom 2 Symptom 3 Symptom N Symptom Detection Diagnosis Generation Resolution Resolver 1 Resolver 2 Resolver M .... Resolver Selection Metrics
  • 41. Incorporating Dhalion into Heron 41 S1 B2 B3 Stream Manager Stream ManagerS1 B2 B3 B4 B4 Topology Master Health Manager Metrics Manager Metrics Manager Action Log Action Blacklist The Health Manager periodically executes Dhalion policies that maintain the health of the topology. The Action Log maintains a list of actions taken by the policy and the corresponding diagnosis. The Action Blacklist contains a list of diagnosis descriptions and corresponding actions taken that did not produce the expected outcome.
  • 42. Dynamic Resource Provisioning 42 Policy This policy reacts to unexpected load variations (workload spikes) Goal Goal is to scale up and scale down the topology resources as needed - while keeping the topology in a steady state where back pressure is not observed H C
  • 43. Dynamic Resource Provisioning 43 Pending Tuples Detector Backpressure Detector Processing Rate Skew Detector Resource Over provisioning Diagnoser Resource Under Provisioning Diagnoser Data Skew Diagnoser Resolver Invocation Diagnosis Symptoms Symptom Detection Diagnosis Generation Resolution Metrics Slow Instances Diagnoser Bolt Scale Down Resolver Bolt Scale Up Resolver Data Skew Resolver Restart Instances Resolver
  • 44. Dynamic Resource Provisioning - Steady State 44 Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Bolt Splitter Bolt Counter Bolt Counter Bolt 100 | 20 100 | 20 processing rate (tps) | queue size (#tuples)
  • 45. Dynamic Resource Provisioning - Under Provisioned 45 Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Bolt Splitter Bolt Counter Bolt Counter Bolt 150 | 80 150 | 80 processing rate (tps) | queue size (#tuples)
  • 46. Dynamic Resource Provisioning - Steady State 46 Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Bolt Splitter Bolt Counter Bolt Counter Bolt 100 | 20 100 | 20 processing rate (tps) | queue size (#tuples)
  • 47. Dynamic Resource Provisioning - Slow Instance 47 Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Bolt Splitter Bolt Counter Bolt Counter Bolt 50 | 05 50 | 80 processing rate (tps) | queue size (#tuples)
  • 48. Dynamic Resource Provisioning - Steady State 48 Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Bolt Splitter Bolt Counter Bolt Counter Bolt 100 | 20 100 | 20 processing rate (tps) | queue size (#tuples)
  • 49. Dynamic Resource Provisioning - Data Skew 49 Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Bolt Splitter Bolt Counter Bolt Counter Bolt 50 | 05 150 | 80 processing rate (tps) | queue size (#tuples)
  • 50. Satisfying Throughput SLOs 50 Policy Automatically tunes the number of spouts and bolts so that the SLO is met Goal To eliminate the tuning phase. User submits a topology with parallelism 1 a t a l l s t a g e s a n d a throughput SLO H C Mechanisms Policy reuses the basic mechanism of the dynamic resource provisioning policy. Employs additional mechanism to H
  • 51. Experimental Setup 51 % % Spout Splitter Bolt Counter Bolt Shuffle Grouping Fields Grouping Microsoft HDInsight Intel Xeon ES-2673 CPU@2.40 GHz 28 GB of Memory Throughput of Spouts (No. Of tuples emitted over 1 min) Throughput of Bolts (No. of tuples emitted over 1 min) Number of Heron Instances provisioned Hardware and Software Configuration Evaluation Metrics
  • 52. Dynamic Resource Provisioning 52 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 0 10 20 30 40 50 60 70 80 90 100 110 120 Normalized Throughput Time (in minutes) Spout Splitter Bolt Counter Bolt Scale Down Scale Up S1 S2 S3 The Dynamic Resource Provisioning Policy is able to adjust the topology resources on-the-fly when workload spikes occur. The policy can correctly detect and resolve bottlenecks even on multi-stage topologies where backpressure is gradually propagated from one stage of the topology to another.
  • 53. Dynamic Resource Provisioning 53 0 5 10 15 0 20 40 60 80 100 120 Number of Bolts Time (in minutes) Splitter Bolt Counter Bolt Heron Instances are gradually scaled up and down according to the input load
  • 54. Conclusion 54 " Auto piloting is important in Streaming systems Key issues - Tuning, slow hosts, network and data skew Dhalion provides a framework to tackle these using specific policiesG