Background• Creates by Nathan Marz @ BackType/Twitter – Analyze twits, links, users on Twitter• Opensourced at Sep 2011 – Eclipse Public License 1.0 – Storm 0.5.2 – 16k java and 7k Clojure Loc – Current stable release 0.8.2 • 0.9.0 major core improvement
Background• Active user group – https://groups.google.com/group/storm-user – https://github.com/nathanmarz/storm – Most watched java repo at GitHub (>4k watcher) – Used by over 30 companies • Twitter, Groupon, Alibaba, GumGum, ..
Topology• Topology – A directed graph of Spouts and Bolts
Tasks• Instances of Spouts and Blots• Managed by Supervisor – http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
Stream grouping• All grouping – Send to all tasks• Global grouping – Pick task with lowest id• Shuffle grouping – Pick a random task• Fields grouping – Consistent hashing on a subset of tuple fields
Storm fault-tolerance• Reliability API – Spout tuple creation • colloctor.emit(values, msgID); – Child tuple creation (Bolts) • colloctor.emit(parentTuples, values); – Tuple end of processing • collector.ack(tuple); – Tuple failed to process • collector.fail(tuple);
Storm fault-tolerance• Disable reliability API – Globally • Config.TOPOLOGY_ACKER_EXECUTORS = 0 – On topology level • Collector.emit(values, msgID); – For a single tuple • Collector.emit(paranetTuples, values);