Resource Aware Scheduling in Apache Storm

DataWorks Summit/Hadoop Summit
DataWorks Summit/Hadoop SummitDataWorks Summit/Hadoop Summit
RESOURCE
AWARE
SCHEDULING IN
APACHE STORM
Presented by Boyang Jerry Peng
2
ABOUT ME
• Apache Storm Committer and PMC member
• Member of the Yahoo’s low latency Team
 Data processing solutions with low latency
• Graduate student @ University of Illinois, Urbana-Champaign
 Research emphasis in distributed systems and stream processing
• Contact:
 jerrypeng@yahoo-inc.com
3
AGENDA
•Overview of Apache Storm
•Problems and Challenges
•Introduction of Resource Aware
Scheduler
•Results
4
OVERVIEW
• Apache Storm is an open source distributed real-time data stream processing
platform
 Real-time analytics
 Online machine learning
 Continuous computation
 Distributed RPC
 ETL
5
STORM TOPOLOGY
• Processing can be represented as a directed graph
• Spouts are sources of information
• Bolts are operators that process data
6
DEFINITIONS OF STORM TERMS
• Stream
 an unbounded sequence of tuples.
• Component
 A processing operator in a Storm
topology that is either a Bolt or Spout
• Executors
 Threads that are spawned in worker
processes that execute the logic of
components
• Worker Process
 A process spawned by Storm that may
run one or more executors.
7
STORM ARCHITECTURE
Master
Node
Cluster
Coordination
Worker
processes
Worker
Nimbus
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor Worker
Worker
Worker
Launches
workers
8
LOGICAL VS PHYSICAL CONNECTION IN STORM
9
OVERVIEW OF SCHEDULING IN STORM
• Default Scheduling Strategy
 Naïve round robin scheduler
 Naïve load limiter (Worker Slots)
• Multitenant Scheduler
 Default Scheduler with multitenant capabilities (supported by
security)
 Can allocate a set of isolated nodes for topology (Soft
Partitioning)
Resource Aware
10
RUNNING STORM AT YAHOO - CHALLENGES
• Increasing heterogeneous clusters
 Isolation Scheduler – handing out dedicated machines
• Low cluster overall resource utilization
 Users not utilizing their isolated allocation very well
• Unbalanced resource usage
 Some machines not used, others over used
• Per topology scheduling strategy
 Different topologies have different scheduling needs (e.g. constraint based
scheduling)
11
RUNNING STORM AT YAHOO – SCALE
600
2300
3500
120
300
680
0
100
200
300
400
500
600
700
800
0
500
1000
1500
2000
2500
3000
3500
4000
2012 2013 2014 2015 2016
Nodes
Year
Total Nodes Running Storm at Yahoo
Total Nodes Largest Cluster Size
12
RESOURCE AWARE SCHEDULING IN STORM
• Scheduling in Storm that takes into account resource availability on
machines and resource requirement of workloads when scheduling
the topology
 Fine grain resource control
 Resource Aware Scheduler (RAS) implements this function
- Includes many nice multi-tenant features
• Built on top of:
 Peng, Boyang, Mohammad Hosseini, Zhihao Hong, Reza Farivar,
and Roy Campbell. "R-storm: Resource-aware scheduling in
storm." In Proceedings of the 16th Annual Middleware Conference,
pp. 149-161. ACM, 2015
13
RAS API
• Fine grain resource control
 Allows users to specify resources requirement for each component (Spout or Bolt) in a Storm Topology:
API to set component memory requirement:
API to set component CPU requirement:
Example of Usage:
public T setMemoryLoad(Number onHeap, Number offHeap)
public T setCPULoad(Number amount)
SpoutDeclarer s1 = builder.setSpout("word", new TestWordSpout(), 10);
s1.setMemoryLoad(1024.0, 512.0);
builder.setBolt("exclaim1", new ExclamationBolt(), 3)
.shuffleGrouping("word").setCPULoad(100.0);
14
CLUSTER CONFIGURATIONS
conf/storm.yaml
.
.
.
supervisor.memory.capacity.mb: 20480.0
supervisor.cpu.capacity: 400.0
.
.
.
15
RAS FEATURES – PLUGGABLE PER TOPOLOGY
SCHEDULING STRATEGIES
• Allows users to specify which scheduling strategy to use
• Default Strategy
- Based on:
• Peng, Boyang, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. "R-storm: Resource-
aware scheduling in storm." In Proceedings of the 16th Annual Middleware Conference, pp. 149-161. ACM,
2015.
- Enhancements have been made (e.g. limiting max heap size per worker, better rack selection algorithm, etc)
- Aims to pack topology as tightly as possible on machines to reduce communication latency and increase
utilization
- Collocating components that communication with each other (operator chaining)
• Constraint Based Scheduling Strategy
 CSP problem solver
conf.setTopologyStrategy(DefaultResourceAwareStrategy.class);
16
RAS FEATURES – RESOURCE ISOLATION VIA
CGROUPS (LINUX PLATFORMS ONLY*)
• Replaces resource isolation via isolated nodes
• Resource quotas enforced on a per worker basis
• Each worker should not go over its allocated resource quota
• Guarantee QOS and topology isolation
• Documentation:
https://storm.apache.org/releases/2.0.0-
SNAPSHOT/cgroups_in_storm.html
*RHEL 7 or higher. Potential critical bugs in older RHEL versions.
17
RAS FEATURES – PER USER RESOURCE
GUARANTEES
• Configurable per user resource guarantees
18
RAS FEATURE – TOPOLOGY PRIORITY
• Users can set the priority of a topology to indicate its importance
• The range of topology priorities can range form 0-29. The topologies priorities will
be partitioned into several priority levels that may contain a range of priorities
conf.setTopologyPriority(int priority)
PRODUCTION => 0 – 9
STAGING => 10 – 19
DEV => 20 – 29
19
RAS FEATURES – PLUGGABLE TOPOLOGY
PRIORITY
• Topology Priority Strategy
 Which topology should be scheduled first?
 Cluster wide configuration set in storm.yaml
 Default Topology Priority Strategy
- Takes into account resource guarantees and topology priority
- Schedules topologies from users who is the most under his or her resource
guarantee.
- Topologies of each user is sorted by priority
- More details:
https://storm.apache.org/releases/2.0.0-
SNAPSHOT/Resource_Aware_Scheduler_overview.html
20
RAS FEATURES – PLUGGABLE TOPOLOGY
EVICTION STRATEGIES
• Topology Eviction Strategy
 When there is not enough resource which topology from which user to evict?
 Cluster wide configuration set in storm.yaml
 Default Eviction Strategy
- Based on how much a user’s guarantee has been satisfied
- Priority of the topology
 FIFO Eviction Strategy
- Used on our staging clusters.
- Ad hoc use
 More details:
https://storm.apache.org/releases/2.0.0-
SNAPSHOT/Resource_Aware_Scheduler_overview.html
21
SELECTED RESULTS (THROUGHPUT) FROM PAPER [1] – YAHOO
TOPOLOGIES
47% improvement!
50% improvement!
* Figures used [1]
22
SELECTED RESULTS (THROUGHPUT) FROM PAPER [1] – YAHOO
TOPOLOGIES
23
PRELIMINARY RESULTS IN YAHOO STORM CLUSTERS
24
PRELIMINARY RESULTS IN YAHOO STORM CLUSTERS
25
CONCLUDING REMARKS AND FUTURE WORK
• In Summary
 Built resource aware scheduler
• Migration Process
 In the Progress from migrating from MultitenantScheduler to RAS
 Working through bugs with Cgroups, Java, and Linux kernel
• Future Work
 Improved Scheduling Strategies
 Real-time resource monitoring
 Elasticity
26
QUESTIONS
27
REFERENCES
• [1] Peng, Boyang, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. "R-storm:
Resource-aware scheduling in Storm." In Proceedings of the 16th Annual Middleware Conference,
pp. 149-161. ACM, 2015.
 http://web.engr.illinois.edu/~bpeng/files/r-storm.pdf
• [2] Official Resource Aware Scheduler Documentation
 https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.htm
• [3] Umbrella Jira for Resource Aware Scheduling in Storm
 https://issues.apache.org/jira/browse/STORM-893
28
EXTRA SLIDES
29
PROBLEM FORMULATION
• Targeting 3 types of resources
 CPU, Memory, and Network
• Limited resource budget for each node
• Specific resource needs for each task
Goal:
Improve throughput by maximizing
utilization and minimizing network
latency
30
PROBLEM FORMULATION
• Set of all tasks Ƭ = {τ1 , τ2, τ3, …}, each task τi has resource demands
 CPU requirement of cτi
 Network bandwidth requirement of bτi
 Memory requirement of mτi
• Set of all nodes N = {θ1 , θ2, θ3, …}
 Total available CPU budget of W1
 Total available Bandwidth budget of W2
 Total available Memory budget of W3
30
31
PROBLEM FORMULATION
• Qi : Throughput contribution of each node
• Assign tasks to a subset of nodes N’ ∈ N that minimizes the total resource waste:
31
32
PROBLEM FORMULATION
 Quadratic Multiple 3D Knapsack Problem
 We call it QM3DKP!
 NP-Hard!
• Compute optimal solutions or approximate solutions may be hard and time consuming
• Real time systems need fast scheduling
 Re-compute scheduling when failures occur
32
33
SOFT CONSTRAINTS VS HARD CONSTRAINTS
• Soft Constraints
 CPU and Network Resources
 Graceful performance degradation with over subscription
• Hard Constraints
 Memory
 Oversubscribe -> Game over
Your date comes hereYour footer comes here33
34
OBSERVATIONS ON NETWORK LATENCY
1. Inter-rack communication is the slowest
2. Inter-node communication is slow
3. Inter-process communication is faster
4. Intra-process communication is the fastest
Your date comes hereYour footer comes here34
35
HEURISTIC ALGORITHM
35
• Greedy approach
• Designing a 3D resource space
 Each resource maps to an axis
 Can be generalized to nD resource space
 Trivial overhead!
• Based on:
 min (Euclidean distance)
 Satisfy hard constraints
36
HEURISTIC ALGORITHM
Your date comes hereYour footer comes here36
37
HEURISTIC ALGORITHM
Your date comes hereYour footer comes here37
Switch
1 2
3 4 5
6
38
HEURISTIC ALGORITHM
38
• Our proposed heuristic algorithm has the following properties:
1) Tasks of components that communicate will each other will have the highest priority to be scheduled in close network proximity
to each other.
2) No hard resource constraint is violated.
3) Resource waste on nodes are minimized.
1 of 38

Recommended

Neutron packet logging framework by
Neutron packet logging frameworkNeutron packet logging framework
Neutron packet logging frameworkVietnam Open Infrastructure User Group
351 views24 slides
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L... by
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...confluent
5K views23 slides
Apache Beam: A unified model for batch and stream processing data by
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
22.5K views73 slides
Autoscaling Flink with Reactive Mode by
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
922 views17 slides
Ceph issue 해결 사례 by
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례Open Source Consulting
1.8K views26 slides
Deep Dive into Building Streaming Applications with Apache Pulsar by
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar Timothy Spann
298 views61 slides

More Related Content

What's hot

Introduction to Apache ZooKeeper by
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
128.5K views30 slides
eBPF - Observability In Deep by
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
638 views25 slides
Loki - like prometheus, but for logs by
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logsJuraj Hantak
529 views38 slides
Apache kafka performance(latency)_benchmark_v0.3 by
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
1.8K views13 slides
Apache Pinot Meetup Sept02, 2020 by
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Mayank Shrivastava
925 views74 slides
Drive into calico architecture by
Drive into calico architectureDrive into calico architecture
Drive into calico architectureAnirban Sen Chowdhary
1.6K views10 slides

What's hot(20)

Introduction to Apache ZooKeeper by Saurav Haloi
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi128.5K views
eBPF - Observability In Deep by Mydbops
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
Mydbops638 views
Loki - like prometheus, but for logs by Juraj Hantak
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logs
Juraj Hantak529 views
Apache kafka performance(latency)_benchmark_v0.3 by SANG WON PARK
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
SANG WON PARK1.8K views
Flink powered stream processing platform at Pinterest by Flink Forward
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward224 views
Streaming all over the world Real life use cases with Kafka Streams by confluent
Streaming all over the world  Real life use cases with Kafka StreamsStreaming all over the world  Real life use cases with Kafka Streams
Streaming all over the world Real life use cases with Kafka Streams
confluent566 views
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ... by confluent
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
confluent12.1K views
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka by Kai Wähner
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner16.1K views
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent by Henning Jacobs
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
Henning Jacobs2.7K views
Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022 by HostedbyConfluent
Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022
Handing Failure With Grace in Kafka Streams With Walker Carlson | Current 2022
HostedbyConfluent438 views
From cache to in-memory data grid. Introduction to Hazelcast. by Taras Matyashovsky
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky42.6K views
Using the New Apache Flink Kubernetes Operator in a Production Deployment by Flink Forward
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward655 views
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli... by Flink Forward
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward266 views
Replacing iptables with eBPF in Kubernetes with Cilium by Michal Rostecki
Replacing iptables with eBPF in Kubernetes with CiliumReplacing iptables with eBPF in Kubernetes with Cilium
Replacing iptables with eBPF in Kubernetes with Cilium
Michal Rostecki469 views
Scalability, Availability & Stability Patterns by Jonas Bonér
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér516K views
Déploiement ELK en conditions réelles by Geoffroy Arnoud
Déploiement ELK en conditions réellesDéploiement ELK en conditions réelles
Déploiement ELK en conditions réelles
Geoffroy Arnoud4.9K views
How OpenShift SDN helps to automate by Ilkka Tengvall
How OpenShift SDN helps to automateHow OpenShift SDN helps to automate
How OpenShift SDN helps to automate
Ilkka Tengvall3.9K views

Viewers also liked

Storm: distributed and fault-tolerant realtime computation by
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
232.6K views75 slides
Scaling Apache Storm - Strata + Hadoop World 2014 by
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
167.6K views80 slides
Realtime Analytics with Storm and Hadoop by
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
238.1K views83 slides
Hadoop Summit Europe 2014: Apache Storm Architecture by
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
188K views113 slides
Apache Storm 0.9 basic training - Verisign by
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
233.9K views129 slides
Yahoo compares Storm and Spark by
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and SparkChicago Hadoop Users Group
198.4K views27 slides

Viewers also liked(7)

Storm: distributed and fault-tolerant realtime computation by nathanmarz
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz232.6K views
Scaling Apache Storm - Strata + Hadoop World 2014 by P. Taylor Goetz
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz167.6K views
Realtime Analytics with Storm and Hadoop by DataWorks Summit
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit238.1K views
Hadoop Summit Europe 2014: Apache Storm Architecture by P. Taylor Goetz
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz188K views
Apache Storm 0.9 basic training - Verisign by Michael Noll
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll233.9K views
Kafka Tutorial Advanced Kafka Consumers by Jean-Paul Azar
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
Jean-Paul Azar16.5K views

Similar to Resource Aware Scheduling in Apache Storm

Resource Aware Scheduling in Storm (Hadoop Summit 2016) by
Resource Aware Scheduling in Storm (Hadoop Summit 2016)Resource Aware Scheduling in Storm (Hadoop Summit 2016)
Resource Aware Scheduling in Storm (Hadoop Summit 2016)Boyang Jerry Peng
797 views28 slides
참여기관_발표자료-국민대학교 201301 정기회의 by
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의DzH QWuynh
224 views95 slides
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks by
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksLinh Ngo
243 views17 slides
A sdn based application aware and network provisioning by
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
559 views42 slides
Real Time Operating Systems by
Real Time Operating SystemsReal Time Operating Systems
Real Time Operating SystemsPawandeep Kaur
3K views22 slides
Mastering Real-time Linux by
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time LinuxJean-François Deverge
4.4K views73 slides

Similar to Resource Aware Scheduling in Apache Storm(20)

Resource Aware Scheduling in Storm (Hadoop Summit 2016) by Boyang Jerry Peng
Resource Aware Scheduling in Storm (Hadoop Summit 2016)Resource Aware Scheduling in Storm (Hadoop Summit 2016)
Resource Aware Scheduling in Storm (Hadoop Summit 2016)
Boyang Jerry Peng797 views
참여기관_발표자료-국민대학교 201301 정기회의 by DzH QWuynh
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
DzH QWuynh224 views
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks by Linh Ngo
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Linh Ngo243 views
A sdn based application aware and network provisioning by Stanley Wang
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
Stanley Wang559 views
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra... by IRJET Journal
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET-Framework for Dynamic Resource Allocation and Efficient Scheduling Stra...
IRJET Journal26 views
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS by Maurvi04
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDSFAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS
Maurvi04526 views
Apache Apex: Stream Processing Architecture and Applications by Thomas Weise
Apache Apex: Stream Processing Architecture and ApplicationsApache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
Thomas Weise1.6K views
Apache Apex: Stream Processing Architecture and Applications by Comsysto Reply GmbH
Apache Apex: Stream Processing Architecture and Applications Apache Apex: Stream Processing Architecture and Applications
Apache Apex: Stream Processing Architecture and Applications
High availability and disaster recovery in IBM PureApplication System by Scott Moonen
High availability and disaster recovery in IBM PureApplication SystemHigh availability and disaster recovery in IBM PureApplication System
High availability and disaster recovery in IBM PureApplication System
Scott Moonen1.6K views
load-balancing-method-for-embedded-rt-system-20120711-0940 by Samsung Electronics
load-balancing-method-for-embedded-rt-system-20120711-0940load-balancing-method-for-embedded-rt-system-20120711-0940
load-balancing-method-for-embedded-rt-system-20120711-0940
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar by Karthik Ramasamy
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Karthik Ramasamy699 views
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters by Sumeet Singh
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Sumeet Singh716 views
Real time Operating System by Tech_MX
Real time Operating SystemReal time Operating System
Real time Operating System
Tech_MX106.5K views
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe... by Matteo Ferroni
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
Matteo Ferroni116 views
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments by LEGATO project
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project38 views
HPC Controls Future by rcastain
HPC Controls FutureHPC Controls Future
HPC Controls Future
rcastain1K views
Crash course on data streaming (with examples using Apache Flink) by Vincenzo Gulisano
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano255 views

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production by
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionDataWorks Summit/Hadoop Summit
9.6K views28 slides
State of Security: Apache Spark & Apache Zeppelin by
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinDataWorks Summit/Hadoop Summit
3.2K views25 slides
Unleashing the Power of Apache Atlas with Apache Ranger by
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
6.8K views33 slides
Enabling Digital Diagnostics with a Data Science Platform by
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
1.4K views10 slides
Revolutionize Text Mining with Spark and Zeppelin by
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinDataWorks Summit/Hadoop Summit
2.1K views28 slides
Double Your Hadoop Performance with Hortonworks SmartSense by
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
1K views28 slides

More from DataWorks Summit/Hadoop Summit(20)

Recently uploaded

Mini-Track: Challenges to Network Automation Adoption by
Mini-Track: Challenges to Network Automation AdoptionMini-Track: Challenges to Network Automation Adoption
Mini-Track: Challenges to Network Automation AdoptionNetwork Automation Forum
17 views27 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
66 views46 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
33 views29 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
23 views15 slides
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism UniverseSimone Puorto
13 views61 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
20 views161 slides

Recently uploaded(20)

"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays33 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab23 views
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe by Simone Puorto
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
2024: A Travel Odyssey The Role of Generative AI in the Tourism Universe
Simone Puorto13 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman38 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software317 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
The Forbidden VPN Secrets.pdf by Mariam Shaba
The Forbidden VPN Secrets.pdfThe Forbidden VPN Secrets.pdf
The Forbidden VPN Secrets.pdf
Mariam Shaba20 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays24 views

Resource Aware Scheduling in Apache Storm

  • 2. 2 ABOUT ME • Apache Storm Committer and PMC member • Member of the Yahoo’s low latency Team  Data processing solutions with low latency • Graduate student @ University of Illinois, Urbana-Champaign  Research emphasis in distributed systems and stream processing • Contact:  jerrypeng@yahoo-inc.com
  • 3. 3 AGENDA •Overview of Apache Storm •Problems and Challenges •Introduction of Resource Aware Scheduler •Results
  • 4. 4 OVERVIEW • Apache Storm is an open source distributed real-time data stream processing platform  Real-time analytics  Online machine learning  Continuous computation  Distributed RPC  ETL
  • 5. 5 STORM TOPOLOGY • Processing can be represented as a directed graph • Spouts are sources of information • Bolts are operators that process data
  • 6. 6 DEFINITIONS OF STORM TERMS • Stream  an unbounded sequence of tuples. • Component  A processing operator in a Storm topology that is either a Bolt or Spout • Executors  Threads that are spawned in worker processes that execute the logic of components • Worker Process  A process spawned by Storm that may run one or more executors.
  • 8. 8 LOGICAL VS PHYSICAL CONNECTION IN STORM
  • 9. 9 OVERVIEW OF SCHEDULING IN STORM • Default Scheduling Strategy  Naïve round robin scheduler  Naïve load limiter (Worker Slots) • Multitenant Scheduler  Default Scheduler with multitenant capabilities (supported by security)  Can allocate a set of isolated nodes for topology (Soft Partitioning) Resource Aware
  • 10. 10 RUNNING STORM AT YAHOO - CHALLENGES • Increasing heterogeneous clusters  Isolation Scheduler – handing out dedicated machines • Low cluster overall resource utilization  Users not utilizing their isolated allocation very well • Unbalanced resource usage  Some machines not used, others over used • Per topology scheduling strategy  Different topologies have different scheduling needs (e.g. constraint based scheduling)
  • 11. 11 RUNNING STORM AT YAHOO – SCALE 600 2300 3500 120 300 680 0 100 200 300 400 500 600 700 800 0 500 1000 1500 2000 2500 3000 3500 4000 2012 2013 2014 2015 2016 Nodes Year Total Nodes Running Storm at Yahoo Total Nodes Largest Cluster Size
  • 12. 12 RESOURCE AWARE SCHEDULING IN STORM • Scheduling in Storm that takes into account resource availability on machines and resource requirement of workloads when scheduling the topology  Fine grain resource control  Resource Aware Scheduler (RAS) implements this function - Includes many nice multi-tenant features • Built on top of:  Peng, Boyang, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. "R-storm: Resource-aware scheduling in storm." In Proceedings of the 16th Annual Middleware Conference, pp. 149-161. ACM, 2015
  • 13. 13 RAS API • Fine grain resource control  Allows users to specify resources requirement for each component (Spout or Bolt) in a Storm Topology: API to set component memory requirement: API to set component CPU requirement: Example of Usage: public T setMemoryLoad(Number onHeap, Number offHeap) public T setCPULoad(Number amount) SpoutDeclarer s1 = builder.setSpout("word", new TestWordSpout(), 10); s1.setMemoryLoad(1024.0, 512.0); builder.setBolt("exclaim1", new ExclamationBolt(), 3) .shuffleGrouping("word").setCPULoad(100.0);
  • 15. 15 RAS FEATURES – PLUGGABLE PER TOPOLOGY SCHEDULING STRATEGIES • Allows users to specify which scheduling strategy to use • Default Strategy - Based on: • Peng, Boyang, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. "R-storm: Resource- aware scheduling in storm." In Proceedings of the 16th Annual Middleware Conference, pp. 149-161. ACM, 2015. - Enhancements have been made (e.g. limiting max heap size per worker, better rack selection algorithm, etc) - Aims to pack topology as tightly as possible on machines to reduce communication latency and increase utilization - Collocating components that communication with each other (operator chaining) • Constraint Based Scheduling Strategy  CSP problem solver conf.setTopologyStrategy(DefaultResourceAwareStrategy.class);
  • 16. 16 RAS FEATURES – RESOURCE ISOLATION VIA CGROUPS (LINUX PLATFORMS ONLY*) • Replaces resource isolation via isolated nodes • Resource quotas enforced on a per worker basis • Each worker should not go over its allocated resource quota • Guarantee QOS and topology isolation • Documentation: https://storm.apache.org/releases/2.0.0- SNAPSHOT/cgroups_in_storm.html *RHEL 7 or higher. Potential critical bugs in older RHEL versions.
  • 17. 17 RAS FEATURES – PER USER RESOURCE GUARANTEES • Configurable per user resource guarantees
  • 18. 18 RAS FEATURE – TOPOLOGY PRIORITY • Users can set the priority of a topology to indicate its importance • The range of topology priorities can range form 0-29. The topologies priorities will be partitioned into several priority levels that may contain a range of priorities conf.setTopologyPriority(int priority) PRODUCTION => 0 – 9 STAGING => 10 – 19 DEV => 20 – 29
  • 19. 19 RAS FEATURES – PLUGGABLE TOPOLOGY PRIORITY • Topology Priority Strategy  Which topology should be scheduled first?  Cluster wide configuration set in storm.yaml  Default Topology Priority Strategy - Takes into account resource guarantees and topology priority - Schedules topologies from users who is the most under his or her resource guarantee. - Topologies of each user is sorted by priority - More details: https://storm.apache.org/releases/2.0.0- SNAPSHOT/Resource_Aware_Scheduler_overview.html
  • 20. 20 RAS FEATURES – PLUGGABLE TOPOLOGY EVICTION STRATEGIES • Topology Eviction Strategy  When there is not enough resource which topology from which user to evict?  Cluster wide configuration set in storm.yaml  Default Eviction Strategy - Based on how much a user’s guarantee has been satisfied - Priority of the topology  FIFO Eviction Strategy - Used on our staging clusters. - Ad hoc use  More details: https://storm.apache.org/releases/2.0.0- SNAPSHOT/Resource_Aware_Scheduler_overview.html
  • 21. 21 SELECTED RESULTS (THROUGHPUT) FROM PAPER [1] – YAHOO TOPOLOGIES 47% improvement! 50% improvement! * Figures used [1]
  • 22. 22 SELECTED RESULTS (THROUGHPUT) FROM PAPER [1] – YAHOO TOPOLOGIES
  • 23. 23 PRELIMINARY RESULTS IN YAHOO STORM CLUSTERS
  • 24. 24 PRELIMINARY RESULTS IN YAHOO STORM CLUSTERS
  • 25. 25 CONCLUDING REMARKS AND FUTURE WORK • In Summary  Built resource aware scheduler • Migration Process  In the Progress from migrating from MultitenantScheduler to RAS  Working through bugs with Cgroups, Java, and Linux kernel • Future Work  Improved Scheduling Strategies  Real-time resource monitoring  Elasticity
  • 27. 27 REFERENCES • [1] Peng, Boyang, Mohammad Hosseini, Zhihao Hong, Reza Farivar, and Roy Campbell. "R-storm: Resource-aware scheduling in Storm." In Proceedings of the 16th Annual Middleware Conference, pp. 149-161. ACM, 2015.  http://web.engr.illinois.edu/~bpeng/files/r-storm.pdf • [2] Official Resource Aware Scheduler Documentation  https://storm.apache.org/releases/2.0.0-SNAPSHOT/Resource_Aware_Scheduler_overview.htm • [3] Umbrella Jira for Resource Aware Scheduling in Storm  https://issues.apache.org/jira/browse/STORM-893
  • 29. 29 PROBLEM FORMULATION • Targeting 3 types of resources  CPU, Memory, and Network • Limited resource budget for each node • Specific resource needs for each task Goal: Improve throughput by maximizing utilization and minimizing network latency
  • 30. 30 PROBLEM FORMULATION • Set of all tasks Ƭ = {τ1 , τ2, τ3, …}, each task τi has resource demands  CPU requirement of cτi  Network bandwidth requirement of bτi  Memory requirement of mτi • Set of all nodes N = {θ1 , θ2, θ3, …}  Total available CPU budget of W1  Total available Bandwidth budget of W2  Total available Memory budget of W3 30
  • 31. 31 PROBLEM FORMULATION • Qi : Throughput contribution of each node • Assign tasks to a subset of nodes N’ ∈ N that minimizes the total resource waste: 31
  • 32. 32 PROBLEM FORMULATION  Quadratic Multiple 3D Knapsack Problem  We call it QM3DKP!  NP-Hard! • Compute optimal solutions or approximate solutions may be hard and time consuming • Real time systems need fast scheduling  Re-compute scheduling when failures occur 32
  • 33. 33 SOFT CONSTRAINTS VS HARD CONSTRAINTS • Soft Constraints  CPU and Network Resources  Graceful performance degradation with over subscription • Hard Constraints  Memory  Oversubscribe -> Game over Your date comes hereYour footer comes here33
  • 34. 34 OBSERVATIONS ON NETWORK LATENCY 1. Inter-rack communication is the slowest 2. Inter-node communication is slow 3. Inter-process communication is faster 4. Intra-process communication is the fastest Your date comes hereYour footer comes here34
  • 35. 35 HEURISTIC ALGORITHM 35 • Greedy approach • Designing a 3D resource space  Each resource maps to an axis  Can be generalized to nD resource space  Trivial overhead! • Based on:  min (Euclidean distance)  Satisfy hard constraints
  • 36. 36 HEURISTIC ALGORITHM Your date comes hereYour footer comes here36
  • 37. 37 HEURISTIC ALGORITHM Your date comes hereYour footer comes here37 Switch 1 2 3 4 5 6
  • 38. 38 HEURISTIC ALGORITHM 38 • Our proposed heuristic algorithm has the following properties: 1) Tasks of components that communicate will each other will have the highest priority to be scheduled in close network proximity to each other. 2) No hard resource constraint is violated. 3) Resource waste on nodes are minimized.

Editor's Notes

  1. Good afternoon, My name is Boyang Jerry Peng and I am here to present Resource Aware Scheduling in Apache.
  2. A little about me, apache storm committer and pmc member I am currently apart of the low latency team at Yahoo. Our team primarily works on projects that provide data processing solutions with low latency to yahoo and Apache storm is one of the platforms we work on. Prior to me joining Yahoo, I was a graduate student at the University of Iilinois, urbana champaign with a research emphasis in distributed systems.
  3. First, going to provide a brief overview of Apache Storm Then, I will discuss the problems and challenges of running apache storm at yahoo. Next, I will get to the core of this presentation and talk about resource aware scheduling in Storm. Define what it is and how to use it and how it helps us overcome the problems and challenges I have mentioned Lastly, I will present some results.
  4. Apache Storm is a popular open source distributed data stream processing platform used by many companies in industry There are many use cases for Apache Storm such as: Real-time analytics , Online machine learning , Continuous computation , Distributed RPC , and ETL operations
  5. In apache storm, an application or workload is called a Storm topology. A storm topology, like applications in other stream processing systems, can be represented as a directed graph In which each edge represents a flow of data and each vertex a location where processing data occurs. In Storm, there are two types of operators or component. First type is called a spout. Spouts are sources of information and are responsible for injecting data into the storm topology Second type is called a bolt. Bolts consume streams of data, conduct any user defined processing, and potentially emit new streams of data downstream to be processed by other bolts
  6. Briefly go over some definitions in Storm
  7. Two types of nodes in a Storm cluster A master node that runs a daemon called Nimbus. The master node and the Nimbus daemon is responsible (with the help of Apache Zookeeper) for maintaining the active membership of the storm cluster. The nimbus Node is also responsible for computing schedulings of topologies in the Storm cluster. A worker node in Storm is a node that runs a daemon called supervisor that is responsible for retrieving schedulings from nimbus via zookeeper and launching the necessary processes according to the scheduling to realize the computation of the topology
  8. Let me also talk about the difference between logical and physical connections in Storm. The diagram on the left is an example of a storm topology where executors are organized by component. And each line connecting two executors represents a logical connection. In The diagram on your right, executors are organized by the physical machines they are scheduled on and each line represents a physical connection. As you can see logical connections can vary quite a bit from the physical connections that need to be made in a topology This is where the scheduler can play an important part. How the topology is scheduled can have major impacts on performance of the topology.
  9. Let me talk about how scheduling is done in storm Default scheduler schedules executors in a round robin fashion Uses the concept of worker slots to limit the computation load on a single machine. Can only Launch as many worker processes as worker slots. Each worker can run any number of executors that requires any amount of resources to run. Because not resource aware customers want isolated nodes Not very effective Not resource aware. Executors use any arbitrary amount of resources. See some loads overloaded and some nodes empty
  10. Let me talk about some challenges of running storm at yahoo Our clusters have become increasingly heterogeneous. Made up of older nodes and new nodes that have different hardware specs Handing out dedicated nodes heterogeneous cluster, some times nodes on size some time another Not utilizing resources well. Customers used more nodes then they need. Because they don’t think about resource requirements as well. Nothing else can run on those isolated nodes
  11. Fine grain resource control Deprecates the notion of using worker slots to limit load and removes the need to use isolated nodes. Resource isolation via cgroups
  12. Let me go over the some of the core API for scheduling with resource aware scheduler Allows users to specify the resource requirements for each component…
  13. Cluster admins can specify how much of each resource is available for user on each worker machine
  14. Let me talk about some features Resource Aware Scheduler provides One of them is have pluggable per topology scheduling strategies. We have identified that different topologies might have different scheduling needs Constraint based scheduling strategy: An internal user has some scheduling requirements in which Users can can describe these constraints and the strategy will attempt to find a scheduling that satisfies these constraints
  15. Only neat features we developed to support RAS is resource isolation via cgroups Get rid of delagating isolated nodes that was killing out utilization Rhel 7 cgroup and java memory do play well. Bugs in kernel
  16. Taken into account in the scheduling priority and eviction strategies I will mention latter
  17. Taken into account in scheduling priority and eviction strategies
  18. pluggable In what order should the topologies be scheduled
  19. Pluggable Different clusters should have different eviction policies (Production vs Staging) How much over his or her resource guarantee a user is Not enough resources or sudden failure
  20. Still in the process of migration. The average amount of assigned memory has decreased. Which implies that topologies are becoming more resource efficient to run Using less memory to run Run more topologies
  21. Working out the kinks. Cgroup and memory. Complete migration, beta quality
  22. For each task with a certain resource vector that represents its resource requirement we attempt to find the node with the resource vector that represents its resource availability that is closest Based on min (Euclidean distance) while not violating hard constraints
  23. Based on min (Euclidean distance) while not violating hard constraints