Storm Real Time Computation


Published on

Given at Supercomputer Education Research Centre, IISc, Bangalore

Published in: Technology, Education

Storm Real Time Computation

  1. 1. SERC – CADL Indian Institute of Science Bangalore, India TWITTER STORM Real Time, Fault Tolerant Distributed Framework Created : 25th May, 2013 SONAL RAJ National Institute of Technology, Jamshedpur, India
  2. 2. Background • Created by Nathan Marz @ BackType/Twitter • Analyze tweets, links, users on Twitter • Opensourced at Sep 2011 • Eclipse Public License 1.0 • Storm 0.5.2 • 16k java and 7k Clojure LOC • Current stable release 0.8.2 • 0.9.0 major core improvement
  3. 3. Background • Active user group • • • Most watched java repo at GitHub (>4k watcher) • Used by over 30 companies • Twitter, Groupon, Alibaba, GumGum, ..
  4. 4. What led to storm . .
  5. 5. Problems . . . •Scale is painful •Poor fault-tolerance • Hadoop is stateful •Coding is tedious •Batch processing • Long latency • no realtime
  6. 6. Storm . . .Problems Solved !! •Scalable and robust • No persistent layer •Guarantees no data loss •Fault-tolerant •Programming language agnostic •Use case • Stream processing • Distributed RPC • Continues computation
  7. 7. STORM FEATURES Storm Guaranteed data processing ...,Horizontal scalability Fault-tolerance ..., No intermediate message brokers! ...,Higher level abstraction than message passing ...,"Just works"
  8. 8. Storm’s edge over hadoop HADOOP STORM • Batch processing • Jobs runs to completion • JobTracker is SPOF* • Stateful nodes • Scalable • Guarantees no data loss • Open source Real-time processing Topologies run forever No single point of failure Stateless nodes Scalable Guarantees no data loss Open source * Hadoop 0.21 added some checkpointing SPOF: Single Point Of Failure
  9. 9. Streaming Computation
  10. 10. Paradigm of stream computation Queues /Workers
  11. 11. General method Messages Queue
  12. 12. general method Message routing can be complex Messages Queue
  13. 13. storm use cases
  14. 14. COMPONENTS • Nimbus daemon is comparable to Hadoop JobTracker. It is the master • Supervisor daemon spawns workers, it is comparable to Hadoop TaskTracker • Worker is spawned by supervisor, one per port defined in storm.yaml configuration • Task is run as a thread in workers • Zookeeper is a distributed system, used to store metadata. Nimbus and Supervisor daemons are fail-fast and stateless. All states is kept in Zookeeper. Notice all communication between Nimbus and Supervisors are done through Zookeeper On a cluster with 2k+1 zookeeper nodes, the system can recover when maximally k nodes fails.
  15. 15. STORM ARCHITECTLlRE ,_ , 'I
  16. 16. Storm architecture Master Node ( Similar to Hadoop Job-Tracker )
  17. 17. STORM ARCHITECTLlRE Used for Cluster Co-ordination
  18. 18. STORM ARCHITECTLlRE Runs Worker Nodes I Processes
  19. 19. CONCEPTS • Streams • Topology • A spout • A bolt • An edge represents a grouping
  20. 20. streams
  21. 21. spouts • Example • Read from logs, API calls, event data, queues, …
  22. 22. SPOUTS •Interface ISpout l·lethod Summanr" void ack(java.lang.Object msg_d) Storm has detennined that thetnpl1 e emitted by this spout th the msgld identifierhas been fuUy processed. void acti-.:rate 0 Called when a spout has been actPtated out ,of a deactivated mode. void close() Called when an ISpout is going to be shutdovn. void deactivate() Called vhen a spout has been deacty.,ated. void fail(java.lang.Object msgidl The tnple emitted by this spout vith the msgld identifier hasfailed to befulrlprocessed. void nextTu12le() Vhen thls method is calle<l Stonn is requesting iliat the Spout emit tnples to theoutput colleotor. void open(java.· ti .Map con.f, Tog.ologyContext context, SQoutOutQutCollector co ector) Called when a task for this component is initialized within a worker on the d1rrster.
  23. 23. Bolts •Bolts • Processes input streams and produces new streams • Example • Stream Joins, DBs, APIs, Filters, Aggregation, …
  24. 24. BOLTS • Interface Ibolt
  25. 25. TOPOLOGY •Topology • is a graph where each node is a spout or bolt, and the edges indicate which bolts are subscribing to which streams.
  26. 26. TASKS • Parallelism is implemented using multiples instances of each spout and bolt for simultaneous similar tasks. Spouts and bolts execute as many tasks across the cluster. • Managed by the supervisor daemon
  27. 27. Stream groupings When a tuple is emitted, which task does it go to?
  28. 28. Stream grouping Shuffle grouping: pick a random task Fields grouping: consistent hashing on a subset of tuple fields All grouping: send to all tasks Global grouping: pick task with lowest id
  29. 29. example : streaming word count • TopologyBuilder is used to construct topologies in Java. • Define a Spout in the Topology with parallelism of 5 tasks.
  30. 30. abstraction : DRPC Consumer decides what data it receives and how it gets grouped • Split Sentences into words with parallelism of 8 tasks. • Create a word count stream
  31. 31. ABSTRACTION : DRPC ) public static class SplttSentence extends ShellBolt implements IRtchBolt { public SplttSentence() super("python", "splltsentence.pyH); } public votd declareOutputF1elds(OutputF1eldsDeclarer declare!){ declarer.declaren(ew Fields ''word'')); } } 'import storm class SplttSentenceBolts(torm.BastcBolt): def process(self, tup): words = tup.values[0].spl1t"( 11 for word tn words: storm.emit([word])
  32. 32. INSIDE A BOLT .. public static class WordCount implements IBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); public void prepare(Map conf, TopologyContext conte ) { } public void execute(Tuple tuple, BastcOutputCollector collector){ String vorc..J = tuple.getStr1ng(0); Integer count = counts.get(word); if(count==null)count = 0; count++; counts.put(word, count); collector.emitn(ew Values(word, count)); } public votd cleanup(){ } public vo1d declareOutputFields(OutputFieldsDeclarer declarEr){ declarer.declaren(ew flelds("word", "count")); } }
  33. 33. abstraction : DRPC • Submitting Topologies to the cluster
  34. 34. abstraction : DRPC • Running the Topology in Local Mode
  35. 35. Fault-Tolerance • Zookeeper stores metadata in a very robust way • Nimbus and Supervisor are stateless and only need metadata from ZK to work/restart • When a node dies • The tasks will time out and be reassigned to other workers by Nimbus. • When a worker dies • The supervisor will restart the worker. • Nimbus will reassign worker to another supervisor, if no heartbeats are sent. • If not possible (no free ports), then tasks will be run on other workers in topology. If more capacity is added to the cluster later, STORM will automatically initialize a new worker and spread out the tasks. • When nimbus or supervisor dies • Workers will continue to run • Workers cannot be reassigned without Nimbus • Nimbus and Supervisor should be run using a process monitoring tool, to restarts them automatically if they fail.
  36. 36. AT LEAST ONCE Processing • STORM guarantees at-least-once processing of tuples. • Message id, gets assigned to a tuple when emitting from spout or bolt. Is 64 bits long • Tree of tuples is the tuples generated (directly and indirectly) from a spout tuple. • Ack is called on spout, when tree of tuples for spout tuple is fully processed. • Fail is called on spout, if one of the tuples in the tree of tuples fails or the tree of tuples is not fully processed within a specified timeout (default is 30 seconds). • It is possible to specify the message id, when emitting a tuple. This might be useful for replaying tuples from a queue. Ack/fail method called when tree of tuples have been fully processed or failed / timed-out
  37. 37. AT Least once processing • Anchoring is used to copy the spout tuple message id(s) to the new tuples generated. In this way, every tuple knows the message id(s) of all spout tuples. • Multi-anchoring is when multiple tuples are anchored. If the tuple tree fails, then multiple spout tuples will be replayed. Useful for doing streaming joins and more. • Ack called from a bolt, indicates the tuple has been processed as intented • Fail called from a bolt, replays the spout tuple(s) • Every tuple must be acked/failed or the task will run out of memory at some point. _collector.emit(tuple,new Values(word)); Uses anchoring _collector.emit(new Values(word)); Does NOT use anchoring
  38. 38. exactly once processing • Transactional topologies (TT) is an abstraction built on STORM primitives. • TT guarantees exactly-once-processing of tuples. • Acking is optimized in TT, no need to do anchoring or acking manually. • Bolts execute as new instances per attempt of processing a batch • Example All grouping Spout Task: 1 Bolt Task: 2 Bolt Task: 3 1. A spout tuple is emitted to task 2 and 3 2. Worker responsible for task 3 fails 3. Supervisor restarts worker 4. Spout tuple is replayed and emitted to task 2 and 3 5. Task 2 and 3 initiate new bolts because of new attempt Now there is no problem
  39. 39. ABSTRACTION : DRPC f / l["request-id"',..result"] ,----- +''result.. - DRPC -"args.. Server ::., Topology [..request-id"1· "args' "return-info..] Ill Ill Distributed RPC Architecture
  40. 40. WHY DRPC ? Before Distributed RPC, time-sensitive queries relied on a pre-computed index Storm Does away with the indexing !!
  41. 41. abstraction : DRPC example • Calculating the “Reach” of URL on the fly (in real time ! ) • Written by Nathan Marz to implement storm ! • Real World Application of Storm , open source, available at • Reach is the number of unique people exposed to a URL (tweet) on twitter at any given time.
  42. 42. abstraction : DRPC >> computing reach
  43. 43. ABSTRACTION : DRPC >> REACH TOPOLOGY Spout - shuffle ["follower-id"] + global t
  44. 44. abstraction : DRPC >> Reach topology Create the Topology for the DRPC Implementation of Reach Computation
  45. 45. ABSTRACTION : DRPC _collector.emitn(ew Values(id, count)); } public static class PartialUniquer implements IRichBolt, FinishedCallback { OutputCollector _collecto"; Map<Object, Set<String>> _sets - new HashMap<Object, Set<String>>(); public void execute(Tuple tuple){ Object id = tuple.getValue(0); Set<String> curr = _sets.get(id); if(curr==null){ curr = new HashSet<String>(); _sets.put(id, curr); } curr.add(tuple.getString(l)); _collector.ack(tuple); } @Override public void finishedidO(bject 1d){ Set<String> curr = _sets.remove(id); int count = 0; if(curr!=null)count = curr.size();
  46. 46. ABSTRACTION : DRPC _collector.emitn(ew Values(id, count)); } public static class Part1a1Un1 uer 1m lements IR1chBolt, F1n1shedCa1lback { Ou _co ector; ap<Object, Set<String>> _sets = new HashMap<Object, Set<String>> public void execu e u Object 1d = tuple.getVa1ue(0); Set<String> curr = _sets.get(1d); 1f(curr==nu11){ curr = new HashSet<Str1ng>(); _sets.put(id, curr); } curr.add(tup1e.getStr1ng(l)); _collector.ack(tuple); Keep set of followers for each request id in n1en1ory } @Override public void f1n1shedidO(bject id){ Set<String> curr = _sets.remove(id); i.nt count = 0; 1f(curr!=nu11)count = curr.size();
  47. 47. ABSTRACTION : DRPC _collector.emitn(ew Values(id, count)); } public static class PartialUniquer implements IRichBolt, FinishedCallback { OutputCollector _collector; Map<Object, Set<String>> _sets - new HashMap<Object, Set<String>>(); pub · oid execute(Tuple Object id = tuple.getValue(0 , Set<String> curr = _sets.get(id if(curr==null){ curr = new HashSet<String>(); _sets.put(id, curr); } curr.add(tuple.getString(l)); _collector.ack(tuple); @Override public void finishedidO(bject id){ Set<String> curr = _sets.remove(id); int count = 0;
  48. 48. ABSTRACTION : DRPC _collector.emitn(ew Values(id, count)); } if(curr!=null)count = curr.size();
  49. 49. ABSTRACTION : DRPC public static class PartialUniquer implements IRichBolt, FinishedCallback { OutputCollector _collector; Map<Object, Set<String>> _sets = new HashMap<Object, Set<String>>(); public void execute(Tuple tuple){ Object id = tuple.getValue(0); Set<String> curr = _sets.get(id); if(curr==null){ curr = new HashSet<String>(); _sets.put(id, curr); } curr.add(tuple.getString(l)); _collector.ack(tuple); } lie void finishedidO(bject id){ Set<String> curr = _sets.remove(id); int count = 0; if(curr!=null)count = curr.size(); _collector.emitn(ew Values(id, count
  50. 50. guaranteeing message processing Tuple Tree
  51. 51. Guaranteeing message processing • A spout tuple is not fully processed until all tuples in the tree have been completed. • If the tuple tree is not completed within a specified timeout, the spout tuple is replayed • Use of an inherent tool called the Reliability API
  52. 52. Guaranteeing message processing Marks a single node in the tree as complete “ Anchoring “ creates a new edge in the tuple tree Storm tracks tuple trees for you in an extremely efficient way
  53. 53. Running a storm application •Local Mode • Runs on a single JVM • Used for development testing and debugging •Remote Mode • Submit our processes to Storm Cluster which has many processes running on different machines. • Doesn’t show debugging info, hence it is considered Production Mode.
  54. 54. STORM UI l Pilm• 231HmOI Hos1 p 11-32 181-'B.ta.llltf<!11 l>orl 6700 l:meted lnondwTecS ,,_ .....ey (ntsJ OSII 'UJ21'l!J 0 2 23n' n 57s p11).98 200- 01 «:2 '*'nil (i100 54!S.."'60 033-1 2742"..&0 0 a 2'31 17 tp.IG-t """ &roo 64l!.S320 &oee'.l320 0. 274.."'«>0 0 5 231117m!l!l p 10.1'V-Il7·116.tc2.1nterno! fl700 03:!6 274274() D ,_ Storm Ul Component summary 2 Bolt stats Proc.n cYIMII 031!1 O.alll 0.3:<'0 0320 Input stats (AJItime) • 'Stt.., Process bl.tone)' IM•I 032CI Fa'lood 0 Acted Uosl "'"" • 17n• tOll IP 10.:»-73·2311.«,11111! 6100 0 742740 0
  55. 55. DOCUlVIENTATION nathanman: DastOoard lnbox nathanmarz I storm 2.,051 I. 109 Pull • 23 Wild 2.4 SlAts e.Graphs Home Pages WtklHistory GitAocess Home wPage fGitP&ge Storm is a distributed realtime computation system.Similar to how Hadoop provides a set of generalprimJtives for doing batch processing, Storm prov1desa set or generalprimitivesror doang realtJmecomputation.Storm iss1mp1e,canbe usedwath anyprogramm1ng Jaoguage,and Is a lot of fun to use! Read these first • Ra:Jonale • Sottmg up devolopment environment • Creatmg a new Stormproject • Tutor al Getting help Feeltree to askquestionson Storm's mailing list·ttp:lkjro p :. ooo oom/qrn 1p torm-user You can also come to tho Istorm-user room on " cnodo You can usually find a Storm dovolopor thoro to help you out fated projects
  56. 56. STORM LIBRARIES . . STORM uses a lot of libraries. The most prominent are • Clojure a new lisp programming language. Crash-course follows • Jetty an embedded webserver. Used to host the UI of Nimbus. • Kryo a fast serializer, used when sending tuples • Thrift a framework to build services. Nimbus is a thrift daemon • ZeroMQ a very fast transportation layer • Zookeeper a distributed system for storing metadata
  57. 57. References •Twitter Storm • Mathan Marz • •Storm • nathanmarz@github • •Realtime Analytics with Storm and Hadoop • Hadoop_Summit