45. Stream Groupings
Fields Grouping!
!
Ensures all Tuples with with the same field value(s)
are always routed to the same task.
!
(this is a simple hash of the field values,
modulo the number of tasks)
54. Fault Tolerance
Workers heartbeat back to Supervisors and Nimbus via ZooKeeper, !
as well as locally.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
55. Fault Tolerance
If a worker dies (fails to heartbeat), the Supervisor will restart it
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
56. Fault Tolerance
If a worker dies repeatedly, Nimbus will reassign the work to other!
nodes in the cluster.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
57. Fault Tolerance
If a supervisor node dies, Nimbus will reassign the work to other nodes.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
X
58. Fault Tolerance
If Nimbus dies, topologies will continue to function normally,!
but won’t be able to perform reassignments.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
104. Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
105. Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Batch and real-time on the same cluster
106. Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Security and Multi-tenancy
107. Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Elasticity
108. Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
Storm’s resource management system
maps very naturally to the YARN model.
109. Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
High Availability
110. Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
Detect and scale around bottlenecks
111. Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
Optimize for available resources