0
© Hortonworks Inc. 2011
P. Taylor Goetz
Apache Storm Committer
tgoetz@hortonworks.com
@ptgoetz
Apache Storm Architecture a...
Shedding Light on Data
Shedding Light on Big Data
Shedding Light on Big Data
In Real Time
What is Storm?
Storm is Streaming
Storm is Streaming
Key enabler of the Lamda Architecture
Storm is Fast
Storm is Fast
Clocked at 1M+ messages per second per node
Storm is Scalable
Storm is Scalable
Thousands of workers per cluster
Storm is Fault Tolerant
Storm is Fault Tolerant
Failure is expected, and embraced
Storm is Reliable
Storm is Reliable
Guaranteed message delivery
Storm is Reliable
Exactly-once semantics
Conceptual Model
Tuple
{…}
Tuple
{…} • Core Unit of Data
• Immutable Set of Key/Value
Pairs
Streams
{…} {…} {…} {…} {…} {…} {…}
Unbounded Sequence of Tuples
Spouts
Spouts
• Source of Streams
• Wraps a streaming data source
and emits Tuples
{…}
{…}
{…}
{…}
{…}
{…}
{…}
{…}
{…}
{…}
{…}
{…...
Spout API
public interface ISpout extends Serializable {!
!
void open(Map conf, !
! TopologyContext context, !
! ! ! Spout...
Spout API
public interface ISpout extends Serializable {!
!
void open(Map conf, !
! TopologyContext context, !
! ! ! Spout...
Spout API
public interface ISpout extends Serializable {!
!
void open(Map conf, !
! TopologyContext context, !
! ! ! Spout...
Bolts
Bolts
• Core functions of a
streaming computation
• Receive tuples and do stuff
• Optionally emit additional
tuples
Bolts
• Write to a data store
Bolts
• Read from a data store
Bolts
• Perform arbitrary
computation
Compute
{…}
{…}
{…}
{…}
{…}
{…}
{…}
Bolts
• (Optionally) Emit additional
streams
{…}
{…}
{…}
{…}
{…}
{…}
{…}
Bolt API
public interface IBolt extends Serializable {!
!
void prepare(Map stormConf, !
TopologyContext context, !
OutputC...
Bolt API
public interface IBolt extends Serializable {!
!
void prepare(Map stormConf, !
TopologyContext context, !
OutputC...
Bolt Output API
public interface IOutputCollector extends IErrorReporter {!
!
List<Integer> emit(String streamId, !
Collec...
Bolt Output API
public interface IOutputCollector extends IErrorReporter {!
!
List<Integer> emit(String streamId, !
Collec...
Topologies
Topologies
Topologies
• DAG of Spouts and Bolts
• Data Flow Representation
• Streaming Computation
Topologies
• Storm executes spouts
and bolts as individual
Tasks that run in parallel
on multiple machines.
Stream Groupings
Stream Groupings
Stream Groupings determine how Storm routes
Tuples between tasks in a topology
Stream Groupings
Shuffle!
!
Randomized round-robin.
Stream Groupings
LocalOrShuffle!
!
Randomized round-robin.
(With a preference for intra-worker Tasks)
Stream Groupings
Fields Grouping!
!
Ensures all Tuples with with the same field value(s)
are always routed to the same task.
Stream Groupings
Fields Grouping!
!
Ensures all Tuples with with the same field value(s)
are always routed to the same task...
Physical View
Physical View
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Worker* Worker* Worker* Worker*
Topology Deployment
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Topology Submitter uplo...
Topology Deployment
Nimbus calculates assignments and sends to Zookeeper
ZooKeeperNimbus
Supervisor Supervisor Supervisor ...
Topology Deployment
Supervisor nodes receive assignment information !
via Zookeeper watches.
ZooKeeperNimbus
Supervisor Su...
Topology Deployment
Supervisor nodes download topology from Nimbus:!
• topology.jar!
• topology.ser!
• conf.ser
ZooKeeperN...
Topology Deployment
Supervisors spawn workers (JVM processes) to start the topology
ZooKeeperNimbus
Supervisor Supervisor ...
Fault Tolerance
Fault Tolerance
Workers heartbeat back to Supervisors and Nimbus via ZooKeeper, !
as well as locally.
ZooKeeperNimbus
Supe...
Fault Tolerance
If a worker dies (fails to heartbeat), the Supervisor will restart it
ZooKeeperNimbus
Supervisor Superviso...
Fault Tolerance
If a worker dies repeatedly, Nimbus will reassign the work to other!
nodes in the cluster.
ZooKeeperNimbus...
Fault Tolerance
If a supervisor node dies, Nimbus will reassign the work to other nodes.
ZooKeeperNimbus
Supervisor Superv...
Fault Tolerance
If Nimbus dies, topologies will continue to function normally,!
but won’t be able to perform reassignments...
Parallelism
Scaling a Distributed Computation
Parallelism
Worker (JVM)
Executor (Thread) Executor (Thread) Executor (Thread)
Task Task Task
1 Worker,
Parallelism = 1
Parallelism
Worker (JVM)
Executor (Thread) Executor (Thread) Executor (Thread)
Task Task Task
Executor (Thread)
Task
1 Wor...
Parallelism
Worker (JVM)
Executor (Thread) Executor (Thread)
Task Task
Executor (Thread)
Task
Task
1 Worker,
Parallelism =...
Parallelism
3 Workers,
Parallelism = 1, NumTasks = 1
Worker (JVM)Worker (JVM)Worker (JVM)
Executor (Thread) Executor (Thre...
Internal Messaging
Internal Messaging
Worker Mechanics
Worker Internal Messaging
Worker Receive Thread
Worker Port
List<List<Tuple>>
Receive Buffer
Executor Thread *
Inbound Queu...
Reliable Processing
At Least Once
Reliable Processing
Bolts may emit Tuples Anchored to one received.
Tuple “B” is a descendant of Tuple “A”
{A} {B}
Reliable Processing
Multiple Anchorings form a Tuple tree
(bolts not shown)
{A} {B}
{C}
{D}
{E}
{F}
{G}
{H}
Reliable Processing
Bolts can Acknowledge that a tuple
has been processed successfully.
{A} {B}
ACK
Reliable Processing
Acks are delivered via a system-level bolt
ACK
{A} {B}
Acker Bolt
ackack
Reliable Processing
Bolts can also Fail a tuple to trigger a spout to
replay the original.
FAIL
{A} {B}
Acker Bolt
failfail
Reliable Processing
Any failure in the Tuple tree will trigger a
replay of the original tuple
{A} {B}
{C}
{D}
{E}
{F}
{G}
...
Reliable Processing
How to track a large-scale tuple tree efficiently?
Reliable Processing
A single 64-bit integer.
XOR Magic
Long a, b, c = Random.nextLong();
XOR Magic
Long a, b, c = Random.nextLong();!
!
a ^ a == 0
XOR Magic
Long a, b, c = Random.nextLong();!
!
a ^ a == 0!
!
a ^ a ^ b != 0
XOR Magic
Long a, b, c = Random.nextLong();!
!
a ^ a == 0!
!
a ^ a ^ b != 0!
!
a ^ a ^ b ^ b == 0
XOR Magic
Long a, b, c = Random.nextLong();!
!
a ^ (a ^ b) ^ c ^ (b ^ c) == 0
XOR Magic
Long a, b, c = Random.nextLong();!
!
a ^ (a ^ b) ^ c ^ (b ^ c) == 0
Acks can arrive asynchronously, in any order
Trident
Trident
High-level abstraction built on Storm’s core primitives.
Trident
Built-in support for:
• Merges and Joins
• Aggregations
• Groupings
• Functions
• Filters
Trident
Stateful, incremental processing on top
of any persistence store.
Trident
Trident is Storm
Trident
Fluent, Stream-oriented API
Trident
Fluent, Stream-Oriented API
TridentTopology topology = new TridentTopology();!
FixedBatchSpout spout = new FixedBa...
Trident
Micro-Batch Oriented
Tuple Micro-Batch
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {…}
Trident
Trident Batches are Ordered
Tuple Micro-Batch
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {…}
Tupl...
Trident
Trident Batches can be Partitioned
Tuple Micro-Batch
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {...
Trident
Trident Batches can be Partitioned
Tuple Micro-Batch
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {…}
{…} {…} {…} {...
Trident Operation Types
1. Local Operations (Functions/Filters)
2. Repartitioning Operations (Stream Groupings,
etc.)
3. A...
Trident Topologies
each
each
shuffle
Function
Filter
partition
persist
Trident Toplogies
Partitioning operations define the boundaries
between bolts, and thus network transfer
and parallelism
Trident Topologies
each
each
shuffle
Function
Filter
partition
persist
Bolt 1
Bolt 2
shuffleGrouping()
Partitioning!
Operation
Trident Batch
Coordination
Trident Batch Coordination
Trident SpoutMaster Batch Coordinator User Logic
next
batch
{…} {…} {…} {…}
{…} {…} {…} {…}
{…}...
Controlling
Deployment
Controlling Deployment
How do you control where spouts
and bolts get deployed in a cluster?
Controlling Deployment
How do you control where spouts
and bolts get deployed in a cluster?
Plug-able Schedulers
Controlling Deployment
How do you control where spouts
and bolts get deployed in a cluster?
Isolation Scheduler
Wait… Nimbus, Supervisor, Schedulers…
!
Doesn’t that sound kind of like
resource negotiation?
Storm on YARN
HDFS2	
  
(redundant,	
  reliable	
  storage)
YARN	
  
(cluster	
  resource	
  management)
MapReduce
(batch)...
Storm on YARN
HDFS2	
  
(redundant,	
  reliable	
  storage)
YARN	
  
(cluster	
  resource	
  management)
MapReduce
(batch)...
Storm on YARN
HDFS2	
  
(redundant,	
  reliable	
  storage)
YARN	
  
(cluster	
  resource	
  management)
MapReduce
(batch)...
Storm on YARN
HDFS2	
  
(redundant,	
  reliable	
  storage)
YARN	
  
(cluster	
  resource	
  management)
MapReduce
(batch)...
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YA...
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YA...
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YA...
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YA...
Shameless
Plug
https://www.packtpub.com/
storm-distributed-real-time-
computation-blueprints/book
Thank You!
Contributions welcome.
Join the storm community at:
http://storm.incubator.apache.org
P. Taylor Goetz
tgoetz@ho...
Upcoming SlideShare
Loading in...5
×

Hadoop Summit Europe 2014: Apache Storm Architecture

16,348

Published on

The slides from my session on Apache Storm architecture at Hadoop Summit Europe 2014.

Published in: Software

Transcript of "Hadoop Summit Europe 2014: Apache Storm Architecture"

  1. 1. © Hortonworks Inc. 2011 P. Taylor Goetz Apache Storm Committer tgoetz@hortonworks.com @ptgoetz Apache Storm Architecture and Integration Real-Time Big Data
  2. 2. Shedding Light on Data
  3. 3. Shedding Light on Big Data
  4. 4. Shedding Light on Big Data In Real Time
  5. 5. What is Storm?
  6. 6. Storm is Streaming
  7. 7. Storm is Streaming Key enabler of the Lamda Architecture
  8. 8. Storm is Fast
  9. 9. Storm is Fast Clocked at 1M+ messages per second per node
  10. 10. Storm is Scalable
  11. 11. Storm is Scalable Thousands of workers per cluster
  12. 12. Storm is Fault Tolerant
  13. 13. Storm is Fault Tolerant Failure is expected, and embraced
  14. 14. Storm is Reliable
  15. 15. Storm is Reliable Guaranteed message delivery
  16. 16. Storm is Reliable Exactly-once semantics
  17. 17. Conceptual Model
  18. 18. Tuple {…}
  19. 19. Tuple {…} • Core Unit of Data • Immutable Set of Key/Value Pairs
  20. 20. Streams {…} {…} {…} {…} {…} {…} {…} Unbounded Sequence of Tuples
  21. 21. Spouts
  22. 22. Spouts • Source of Streams • Wraps a streaming data source and emits Tuples {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…}
  23. 23. Spout API public interface ISpout extends Serializable {! ! void open(Map conf, ! ! TopologyContext context, ! ! ! ! SpoutOutputCollector collector);! ! void close();! ! void activate();! ! void deactivate();! ! void nextTuple();! ! void ack(Object msgId);! ! void fail(Object msgId);! } Lifecycle API
  24. 24. Spout API public interface ISpout extends Serializable {! ! void open(Map conf, ! ! TopologyContext context, ! ! ! ! SpoutOutputCollector collector);! ! void close();! ! void activate();! ! void deactivate();! ! void nextTuple();! ! void ack(Object msgId);! ! void fail(Object msgId);! } Core API
  25. 25. Spout API public interface ISpout extends Serializable {! ! void open(Map conf, ! ! TopologyContext context, ! ! ! ! SpoutOutputCollector collector);! ! void close();! ! void activate();! ! void deactivate();! ! void nextTuple();! ! void ack(Object msgId);! ! void fail(Object msgId);! } Reliability API
  26. 26. Bolts
  27. 27. Bolts • Core functions of a streaming computation • Receive tuples and do stuff • Optionally emit additional tuples
  28. 28. Bolts • Write to a data store
  29. 29. Bolts • Read from a data store
  30. 30. Bolts • Perform arbitrary computation Compute
  31. 31. {…} {…} {…} {…} {…} {…} {…} Bolts • (Optionally) Emit additional streams {…} {…} {…} {…} {…} {…} {…}
  32. 32. Bolt API public interface IBolt extends Serializable {! ! void prepare(Map stormConf, ! TopologyContext context, ! OutputCollector collector);! ! void cleanup();! ! ! void execute(Tuple input);! ! ! } Lifecycle API
  33. 33. Bolt API public interface IBolt extends Serializable {! ! void prepare(Map stormConf, ! TopologyContext context, ! OutputCollector collector);! ! void cleanup();! ! ! void execute(Tuple input);! ! ! } Core API
  34. 34. Bolt Output API public interface IOutputCollector extends IErrorReporter {! ! List<Integer> emit(String streamId, ! Collection<Tuple> anchors, ! List<Object> tuple);! ! ! void emitDirect(int taskId, ! String streamId, ! Collection<Tuple> anchors, ! List<Object> tuple);! ! ! void ack(Tuple input);! ! ! void fail(Tuple input);! } Core API
  35. 35. Bolt Output API public interface IOutputCollector extends IErrorReporter {! ! List<Integer> emit(String streamId, ! Collection<Tuple> anchors, ! List<Object> tuple);! ! ! void emitDirect(int taskId, ! String streamId, ! Collection<Tuple> anchors, ! List<Object> tuple);! ! ! void ack(Tuple input);! ! ! void fail(Tuple input);! } Reliability API
  36. 36. Topologies
  37. 37. Topologies
  38. 38. Topologies • DAG of Spouts and Bolts • Data Flow Representation • Streaming Computation
  39. 39. Topologies • Storm executes spouts and bolts as individual Tasks that run in parallel on multiple machines.
  40. 40. Stream Groupings
  41. 41. Stream Groupings Stream Groupings determine how Storm routes Tuples between tasks in a topology
  42. 42. Stream Groupings Shuffle! ! Randomized round-robin.
  43. 43. Stream Groupings LocalOrShuffle! ! Randomized round-robin. (With a preference for intra-worker Tasks)
  44. 44. Stream Groupings Fields Grouping! ! Ensures all Tuples with with the same field value(s) are always routed to the same task.
  45. 45. Stream Groupings Fields Grouping! ! Ensures all Tuples with with the same field value(s) are always routed to the same task. ! (this is a simple hash of the field values, modulo the number of tasks)
  46. 46. Physical View
  47. 47. Physical View ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Worker* Worker* Worker* Worker*
  48. 48. Topology Deployment ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Topology Submitter uploads topology:! • topology.jar! • topology.ser! • conf.ser $ bin/storm jar
  49. 49. Topology Deployment Nimbus calculates assignments and sends to Zookeeper ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter
  50. 50. Topology Deployment Supervisor nodes receive assignment information ! via Zookeeper watches. ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter
  51. 51. Topology Deployment Supervisor nodes download topology from Nimbus:! • topology.jar! • topology.ser! • conf.ser ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter
  52. 52. Topology Deployment Supervisors spawn workers (JVM processes) to start the topology ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Worker Worker Worker Worker
  53. 53. Fault Tolerance
  54. 54. Fault Tolerance Workers heartbeat back to Supervisors and Nimbus via ZooKeeper, ! as well as locally. ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Worker Worker Worker Worker
  55. 55. Fault Tolerance If a worker dies (fails to heartbeat), the Supervisor will restart it ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Worker Worker Worker Worker X
  56. 56. Fault Tolerance If a worker dies repeatedly, Nimbus will reassign the work to other! nodes in the cluster. ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Worker Worker Worker Worker X
  57. 57. Fault Tolerance If a supervisor node dies, Nimbus will reassign the work to other nodes. ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Worker Worker Worker Worker X X
  58. 58. Fault Tolerance If Nimbus dies, topologies will continue to function normally,! but won’t be able to perform reassignments. ZooKeeperNimbus Supervisor Supervisor Supervisor Supervisor Topology Submitter Worker Worker Worker Worker X
  59. 59. Parallelism Scaling a Distributed Computation
  60. 60. Parallelism Worker (JVM) Executor (Thread) Executor (Thread) Executor (Thread) Task Task Task 1 Worker, Parallelism = 1
  61. 61. Parallelism Worker (JVM) Executor (Thread) Executor (Thread) Executor (Thread) Task Task Task Executor (Thread) Task 1 Worker, Parallelism = 2
  62. 62. Parallelism Worker (JVM) Executor (Thread) Executor (Thread) Task Task Executor (Thread) Task Task 1 Worker, Parallelism = 2, NumTasks = 2
  63. 63. Parallelism 3 Workers, Parallelism = 1, NumTasks = 1 Worker (JVM)Worker (JVM)Worker (JVM) Executor (Thread) Executor (Thread) Executor (Thread) Task Task Task
  64. 64. Internal Messaging
  65. 65. Internal Messaging Worker Mechanics
  66. 66. Worker Internal Messaging Worker Receive Thread Worker Port List<List<Tuple>> Receive Buffer Executor Thread * Inbound Queue Outbound Queue Router Send Thread Worker Transfer Thread List<List<Tuple>> Transfer Buffer To Other Workers Task (Spout/Bolt) Task (Spout/Bolt) Task(s) (Spout/Bolt)
  67. 67. Reliable Processing At Least Once
  68. 68. Reliable Processing Bolts may emit Tuples Anchored to one received. Tuple “B” is a descendant of Tuple “A” {A} {B}
  69. 69. Reliable Processing Multiple Anchorings form a Tuple tree (bolts not shown) {A} {B} {C} {D} {E} {F} {G} {H}
  70. 70. Reliable Processing Bolts can Acknowledge that a tuple has been processed successfully. {A} {B} ACK
  71. 71. Reliable Processing Acks are delivered via a system-level bolt ACK {A} {B} Acker Bolt ackack
  72. 72. Reliable Processing Bolts can also Fail a tuple to trigger a spout to replay the original. FAIL {A} {B} Acker Bolt failfail
  73. 73. Reliable Processing Any failure in the Tuple tree will trigger a replay of the original tuple {A} {B} {C} {D} {E} {F} {G} {H} X X
  74. 74. Reliable Processing How to track a large-scale tuple tree efficiently?
  75. 75. Reliable Processing A single 64-bit integer.
  76. 76. XOR Magic Long a, b, c = Random.nextLong();
  77. 77. XOR Magic Long a, b, c = Random.nextLong();! ! a ^ a == 0
  78. 78. XOR Magic Long a, b, c = Random.nextLong();! ! a ^ a == 0! ! a ^ a ^ b != 0
  79. 79. XOR Magic Long a, b, c = Random.nextLong();! ! a ^ a == 0! ! a ^ a ^ b != 0! ! a ^ a ^ b ^ b == 0
  80. 80. XOR Magic Long a, b, c = Random.nextLong();! ! a ^ (a ^ b) ^ c ^ (b ^ c) == 0
  81. 81. XOR Magic Long a, b, c = Random.nextLong();! ! a ^ (a ^ b) ^ c ^ (b ^ c) == 0 Acks can arrive asynchronously, in any order
  82. 82. Trident
  83. 83. Trident High-level abstraction built on Storm’s core primitives.
  84. 84. Trident Built-in support for: • Merges and Joins • Aggregations • Groupings • Functions • Filters
  85. 85. Trident Stateful, incremental processing on top of any persistence store.
  86. 86. Trident Trident is Storm
  87. 87. Trident Fluent, Stream-oriented API
  88. 88. Trident Fluent, Stream-Oriented API TridentTopology topology = new TridentTopology();! FixedBatchSpout spout = new FixedBatchSpout(…);! Stream stream = topology.newStream("words", spout);! ! stream.each(…, new MyFunction())! .groupBy()! .each(…, new MyFilter())! .persistentAggregate(…);! User-defined functions
  89. 89. Trident Micro-Batch Oriented Tuple Micro-Batch {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…}
  90. 90. Trident Trident Batches are Ordered Tuple Micro-Batch {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} Tuple Micro-Batch {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} Batch #1 Batch #2
  91. 91. Trident Trident Batches can be Partitioned Tuple Micro-Batch {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…}
  92. 92. Trident Trident Batches can be Partitioned Tuple Micro-Batch {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} Partition Operation Partition A {…} {…} {…}{…} Partition B {…} {…} {…}{…} Partition C {…} {…} {…}{…} Partition D {…} {…} {…}{…}
  93. 93. Trident Operation Types 1. Local Operations (Functions/Filters) 2. Repartitioning Operations (Stream Groupings, etc.) 3. Aggregations 4. Merges/Joins
  94. 94. Trident Topologies each each shuffle Function Filter partition persist
  95. 95. Trident Toplogies Partitioning operations define the boundaries between bolts, and thus network transfer and parallelism
  96. 96. Trident Topologies each each shuffle Function Filter partition persist Bolt 1 Bolt 2 shuffleGrouping() Partitioning! Operation
  97. 97. Trident Batch Coordination
  98. 98. Trident Batch Coordination Trident SpoutMaster Batch Coordinator User Logic next batch {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} {…} commit
  99. 99. Controlling Deployment
  100. 100. Controlling Deployment How do you control where spouts and bolts get deployed in a cluster?
  101. 101. Controlling Deployment How do you control where spouts and bolts get deployed in a cluster? Plug-able Schedulers
  102. 102. Controlling Deployment How do you control where spouts and bolts get deployed in a cluster? Isolation Scheduler
  103. 103. Wait… Nimbus, Supervisor, Schedulers… ! Doesn’t that sound kind of like resource negotiation?
  104. 104. Storm on YARN HDFS2   (redundant,  reliable  storage) YARN   (cluster  resource  management) MapReduce (batch) Apache  
 STORM   (streaming) HADOOP 2.0 Tez   (interactive) Multi Use Data Platform Batch, Interactive, Online, Streaming, …
  105. 105. Storm on YARN HDFS2   (redundant,  reliable  storage) YARN   (cluster  resource  management) MapReduce (batch) Apache  
 STORM   (streaming) HADOOP 2.0 Tez   (interactive) Multi Use Data Platform Batch, Interactive, Online, Streaming, … Batch and real-time on the same cluster
  106. 106. Storm on YARN HDFS2   (redundant,  reliable  storage) YARN   (cluster  resource  management) MapReduce (batch) Apache  
 STORM   (streaming) HADOOP 2.0 Tez   (interactive) Multi Use Data Platform Batch, Interactive, Online, Streaming, … Security and Multi-tenancy
  107. 107. Storm on YARN HDFS2   (redundant,  reliable  storage) YARN   (cluster  resource  management) MapReduce (batch) Apache  
 STORM   (streaming) HADOOP 2.0 Tez   (interactive) Multi Use Data Platform Batch, Interactive, Online, Streaming, … Elasticity
  108. 108. Storm on YARN Nimbus Resource Management, Scheduling Supervisor Node and Process management Workers Runs topology tasks YARN RM Resource Management Storm AM Manage Topology Containers Runs topology tasks YARN NM Process Management Storm’s resource management system maps very naturally to the YARN model.
  109. 109. Storm on YARN Nimbus Resource Management, Scheduling Supervisor Node and Process management Workers Runs topology tasks YARN RM Resource Management Storm AM Manage Topology Containers Runs topology tasks YARN NM Process Management High Availability
  110. 110. Storm on YARN Nimbus Resource Management, Scheduling Supervisor Node and Process management Workers Runs topology tasks YARN RM Resource Management Storm AM Manage Topology Containers Runs topology tasks YARN NM Process Management Detect and scale around bottlenecks
  111. 111. Storm on YARN Nimbus Resource Management, Scheduling Supervisor Node and Process management Workers Runs topology tasks YARN RM Resource Management Storm AM Manage Topology Containers Runs topology tasks YARN NM Process Management Optimize for available resources
  112. 112. Shameless Plug https://www.packtpub.com/ storm-distributed-real-time- computation-blueprints/book
  113. 113. Thank You! Contributions welcome. Join the storm community at: http://storm.incubator.apache.org P. Taylor Goetz tgoetz@hortonworks.com @ptgoetz
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×