Published on

This is the slides used for a presentation on University of Southern Denmark.

March 2012


  2. 2. HADOOP VS STORM Batch processing Real-time processing Jobs runs to completion Topologies run forever JobTracker is SPOF* No single point of failure Stateful nodes Stateless nodes Scalable Scalable Guarantees no data loss Guarantees no data loss Open source Open source* Hadoop 0.21 added some checkpointing SPOF: Single Point Of Failure
  3. 3. COMPONENTS Nimbus daemon is comparable to Hadoop JobTracker. It is the master Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker Worker is spawned by supervisor, one per port defined in storm.yaml configuration Task is run as a thread in workers Zookeeper* is a distributed system, used to store metadata. Nimbus and Supervisor daemons are fail-fast and stateless. All state is kept in Zookeeper. Notice all communication between Nimbus and Supervisors are done through Zookeeper On a cluster with 2k+1 zookeeper nodes, the system can recover when maximally k nodes fails.* Zookeeper is an Apache top-level project
  4. 4. STREAMSStream is an unbounded sequence of tuples.Topology is a graph where each node is a spout or bolt, and the edges indicatewhich bolts are subscribing to which streams.• A spout is a source of a stream• A bolt is consuming a stream (possibly emits a new one) Subscribes: A• An edge represents a grouping Emits: C Subscribes: C & D Subscribes: A Source of stream A Emits: D Source of stream B Subscribes:A & B
  5. 5. GROUPINGSEach spout or bolt are running X instances in parallel (called tasks).Groupings are used to decide which task in the subscribing bolt, the tuple is sent toShuffle grouping is a random groupingFields grouping is grouped by value, such that equal value results in equal taskAll grouping replicates to all tasksGlobal grouping makes all tuples go to one taskNone grouping makes bolt run in same thread as bolt/spout it subscribes toDirect grouping producer (task that emits) controls which consumer will receive 4 tasks 3 tasks 2 tasks 2 tasks
  6. 6. TestWordSpout ExclamationBolt ExclamationBolt EXAMPLE TopologyBuilder builder = new TopologyBuilder(); Create stream called ”words” Run 10 tasks builder.setSpout("words", new TestWordSpout(), 10); Create stream called ”exclaim1” builder.setBolt("exclaim1", new ExclamationBolt(), 3) Run 3 tasks Subscribe to stream ”words”, .shuffleGrouping("words"); using shufflegrouping Create stream called ”exclaim2” builder.setBolt("exclaim2", new ExclamationBolt(), 2) Run 2 tasks .shuffleGrouping("exclaim1"); Subscribe to stream ”exclaim1”, using shufflegrouping A bolt can subscribe to an unlimited number of streams, by chaining groupings.The sourcecode for this example is part of the storm-starter project on github
  7. 7. TestWordSpout ExclamationBolt ExclamationBoltEXAMPLE – 1TestWordSpoutpublic void nextTuple() { Utils.sleep(100); final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"}; final Random rand = new Random(); final String word = words[rand.nextInt(words.length)]; _collector.emit(new Values(word));}The TestWordSpout emits a random string from the array words, each 100 milliseconds
  8. 8. TestWordSpout ExclamationBolt ExclamationBoltEXAMPLE – 2ExclamationBolt Prepare is called when bolt is createdOutputCollector _collector;public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector;} Execute is called for each tuplepublic void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } declareOutputFields is called when bolt is createdpublic void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word"));}declareOutputFields is used to declare streams and their schemas. It is possible to declare several streams and specify the stream to use when outputting tuples in the emit function call.
  9. 9. FAULT TOLERANCEZookeeper stores metadata in a very robust wayNimbus and Supervisor are stateless and only need metadata from ZK to work/restartWhen a node dies • The tasks will time out and be reassigned to other workers by Nimbus.When a worker dies •The supervisor will restart the worker. •Nimbus will reassign worker to another supervisor, if no heartbeats are sent. •If not possible (no free ports), then tasks will be run on other workers in topology. If more capacity is added to the cluster later, STORM will automatically initialize a new worker and spread out the tasks.When nimbus or supervisor dies • Workers will continue to run • Workers cannot be reassigned without Nimbus • Nimbus and Supervisor should be run using a process monitoring tool, to restarts them automatically if they fail.
  10. 10. AT-LEAST-ONCE PROCESSINGSTORM guarantees at-least-once processing of tuples.Message id, gets assigned to a tuple when emitting from spout or bolt. Is 64 bits longTree of tuples is the tuples generated (directly and indirectly) from a spout tuple.Ack is called on spout, when tree of tuples for spout tuple is fully processed.Fail is called on spout, if one of the tuples in the tree of tuples fails or the tree oftuples is not fully processed within a specified timeout (default is 30 seconds).It is possible to specify the message id, when emitting a tuple. This might be useful forreplaying tuples from a queue. Ack/fail method called when tree of tuples have been fully processed or failed / timed-out
  11. 11. AT-LEAST-ONCE PROCESSING – 2Anchoring is used to copy the spout tuple message id(s) to the new tuplesgenerated. In this way, every tuple knows the message id(s) of all spout tuples.Multi-anchoring is when multiple tuples are anchored. If the tuple tree fails, thenmultiple spout tuples will be replayed. Useful for doing streaming joins and more.Ack called from a bolt, indicates the tuple has been processed as intentedFail called from a bolt, replays the spout tuple(s)Every tuple must be acked/failed or the task will run out of memory at some point._collector.emit(tuple, new Values(word));  Uses anchoring_collector.emit(new Values(word));  Does NOT use anchoring
  12. 12. AT-LEAST-ONCE PROCESSING – 3Acker tasks tracks the tree of tuples for every spout tuple • The acker task responsible for a given spout tuple is determined by modulo on message id. Since all tuples have all spout tuple message ids, it is easy to call the correct acker tasks. • Acker task stores a map, the format is {spoutMsgId, {spoutTaskId, ”ack val”}} • ”ack val” is the representation of state of entire tree of tuples. It is the xor of all tuple message ids created and acked in the tree of tuples. • When ”ack val” is 0, then tuple tree is fully processed. • Since message ids are random 64 bits numbers, chances of ”ack val” becoming 0 by accident is extremely small. Important to set number of acker tasks in topology when processing large amounts of tuples (defaults to 1)
  13. 13. AT-LEAST-ONCE PROCESSING – 4 Example Bolt Emit ”h” Task: 3 spoutIds: 10 msgId: 2 Spout Emit ”hey” Bolt Task: 1 msgId:10 Task: 2 Emit ”ey” spoutIds: 10 msgId: 3 Bolt Task: 4Shows what happens in acker task, for one spout tuple. Format is: {spoutMsgId, {spoutTaskId, ”ack val”}}1. After emit ”hey”: {10, {1, 0000 XOR 1010 = 1010}2. After emit ”h”: {10, {1, 1010 XOR 0010 = 1000}3. After emit ”ey”: {10, {1, 1000 XOR 0011 = 1011} USES 64 BIT IDS4. After ack ”hey”: {10, {1, 1011 XOR 1010 = 0001} IN REALITY5. After ack ”h”: {10, {1, 0001 XOR 0010 = 0011}6. After ack ”ey”: {10, {1, 0011 XOR 0011 = 0000}7. Since ”ack val” is 0, spout tuple with id 10, must be fully processed. Call ack on spout (task 1)
  14. 14. AT-LEAST-ONCE PROCESSING – 5A tuple isnt acked because the task died:The spout tuple(s) at the root of the tree of tuples will time out and be replayed.Acker task dies:All the spout tuples the acker was tracking will time out and be replayed.Spout task dies:In this case the source that the spout talks to is responsible for replaying themessages. For example, queues like Kestrel and RabbitMQ will place all pendingmessages back on the queue when a client disconnects.
  15. 15. AT-LEAST-ONCE PROCESSING – 6At-least-once processing might process a tuple more than once.Example All grouping Bolt 1. A spout tuple is emitted to task 2 and 3 Task: 2 2. Worker responsible for task 3 fails 3. Supervisor restarts worker Spout Task: 1 4. Spout tuple is replayed and emitted to task 2 and 3 5. Task 2 will now have executed the same bolt twice Bolt Task: 3Consider why the all grouping is not important in this example
  16. 16. EXACTLY-ONCE-PROCESSINGTransactional topologies (TT) is an abstraction built on STORM primitives.TT guarantees exactly-once-processing of tuples.Acking is optimized in TT, no need to do anchoring or acking manually.Bolts execute as new instances per attempt of processing a batchExample All grouping Bolt 1. A spout tuple is emitted to task 2 and 3 Task: 2 2. Worker responsible for task 3 fails 3. Supervisor restarts worker Spout Task: 1 4. Spout tuple is replayed and emitted to task 2 and 3 5. Task 2 and 3 initiate new bolts because of new attempt Bolt 5. Now there is no problem Task: 3
  17. 17. EXACTLY-ONCE-PROCESSING – 2For efficiency batch processing of tuples is introduced in TTBatch has two states: processing or committingMany batches can be in the processing state concurrentlyOnly one batch can be in the committing state, and a strong ordering is imposed. Thatmeans batch 1 will always be committed before batch 2 and so on.Types of bolts for TT: BasicBolt, BatchBolt, BatchBolt marked as committerBasicBolt is processing one tuple at a time.BatchBolt is processing batches. Call finishBatch when all tuples of batch is executedBatchBolt marked as committer is calling finishBatch only when batch is incommitting state.
  18. 18. EXACTLY-ONCE-PROCESSING – 3 Transactional spout has capability Committer Committer to replay exact batches of tuples batchbolt batchbolt batchbolt batchboltBATCH IS IN PROCESSING STATEBolt A: execute method is called for all tuples received from spout finishBatch is called when first batch is receivedBolt B: execute method is called for all tuples received from bolt A finishBatch is NOT called because batch is in processing stateBolt C: execute method is called for all tuples received from bolt A (and B) finishBatch is NOT called, because bolt B has not called finishBatchBolt D: execute method is called for all tuples received from bolt C finishBatch is NOT called because batch is in processing stateBATCH CHANGES TO COMMITTING STATEBolt B: finishBatch is calledBolt C: finishBatch is called, because we know we got all tuples from Bolt B nowBolt D: finishBatch is called, because we know we got all tuples from Bolt C now
  19. 19. EXACTLY-ONCE-PROCESSING – 4 Transactional spout All groupings on When batch should enter processing state: batch stream • Coordinator emits a tuple with TransactionAttempt and the metadata for that transaction to the "batch" stream. • All emitter tasks receives the tuple and begins to emit their portion of tuples for the given batch. When processing phase of batch is done (determined by acker task): • Ack gets called on coordinator When ack gets called on coordinator and all prior transactions have committed: Regular bolt, • Coordinator emits a tuple with TransactionAttempt to the commit stream. Parallelism of P • All Bolts which are marked as committers subscribe to the commit stream of the coordinator using an all grouping. • Bolts marked as committers now know the batch is in the committing phaseRegular spout, parallelism of 1Defined streams: batch & commit When batch is fully processed again (determined by acker task): • Ack gets called on coordinator • Coordinator knows batch is now committed
  20. 20. STORM LIBRARIESSTORM uses a lot of libraries. The most prominent areClojure a new lisp programming language. Crash-course followsJetty an embedded webserver. Used to host the UI of Nimbus.Kryo a fast serializer, used when sending tuplesThrift a framework to build services. Nimbus is a thrift daemonZeroMQ a very fast transportation layerZookeeper a distributed system for storing metadata
  21. 21. LEARN MOREWiki ( ( list ( room on freenode from: