Storm Anatomy
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Storm Anatomy

  • 7,828 views
Uploaded on

Introducing Storm's concept, programming model and internal architecture

Introducing Storm's concept, programming model and internal architecture

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
7,828
On Slideshare
7,002
From Embeds
826
Number of Embeds
14

Actions

Shares
Downloads
356
Comments
2
Likes
49

Embeds 826

http://blog.eiichiro.org 513
http://b.eiichiro.org 130
http://www.scoop.it 80
http://www.dschool.co 28
http://dschool.co 28
https://twitter.com 25
http://mangastorytelling.tistory.com 9
http://cloud.feedly.com 4
http://www.google.co.jp 2
http://news.google.com 2
http://bwgsnamkum.dschool.co 2
https://www.google.co.jp 1
http://www.jcc.dschool.co 1
http://localhost 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Storm Anatomy Eiichiro Uchiumi http://www.eiichiro.org/
  • 2. About Me Eiichiro Uchiumi • A solutions architect at working in emerging enterprise technologies - Cloud transformation - Enterprise mobility - Information optimization (big data) https://github.com/eiichiro @eiichirouchiumi http://www.facebook.com/ eiichiro.uchiumi
  • 3. What is Stream Processing? Stream processing is a technical paradigm to process big volume unbound sequence of tuples in realtime • Algorithmic trading • Sensor data monitoring • Continuous analytics = Stream Source Stream Processor
  • 4. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 5. Conceptual View Bolt Bolt Bolt Bolt BoltSpout Spout Bolt: Consumer of streams does some processing and possibly emits new tuples Spout: Source of streams Stream: Unbound sequence of tuples Tuple Tuple: List of name-value pair Topology: Graph of computation composed of spout/bolt as the node and stream as the edge Tuple Tuple
  • 6. Physical View SupervisorNimbus Worker * N Worker Executor * N Task * N Supervisor Supervisor ZooKeeper Supervisor Supervisor ZooKeeper ZooKeeper Worker Nimbus: Master daemon process responsible for • distributing code • assigning tasks • monitoring failures ZooKeeper: Storing cluster operational state Supervisor: Worker daemon process listening for work assigned its node Worker: Java process executes a subset of topology Worker node Worker process Executor: Java thread spawned by worker runs on one or more tasks of the same component Task: Component (spout/ bolt) instance performs the actual data processing
  • 7. Spout import backtype.storm.spout.SpoutOutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichSpout; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Values; import backtype.storm.utils.Utils; public class RandomSentenceSpout extends BaseRichSpout { ! SpoutOutputCollector collector; ! Random random; ! ! @Override ! public void open(Map conf, TopologyContext context, ! ! ! SpoutOutputCollector collector) { ! ! this.collector = collector; ! ! random = new Random(); ! } ! @Override ! public void nextTuple() { ! ! String[] sentences = new String[] { ! ! ! ! "the cow jumped over the moon", ! ! ! ! "an apple a day keeps the doctor away", ! ! ! ! "four score and seven years ago", ! ! ! ! "snow white and the seven dwarfs", ! ! ! ! "i am at two with nature" ! ! }; ! ! String sentence = sentences[random.nextInt(sentences.length)]; ! ! collector.emit(new Values(sentence)); ! }
  • 8. Spout ! @Override ! public void open(Map conf, TopologyContext context, ! ! ! SpoutOutputCollector collector) { ! ! this.collector = collector; ! ! random = new Random(); ! } ! @Override ! public void nextTuple() { ! ! String[] sentences = new String[] { ! ! ! ! "the cow jumped over the moon", ! ! ! ! "an apple a day keeps the doctor away", ! ! ! ! "four score and seven years ago", ! ! ! ! "snow white and the seven dwarfs", ! ! ! ! "i am at two with nature" ! ! }; ! ! String sentence = sentences[random.nextInt(sentences.length)]; ! ! collector.emit(new Values(sentence)); ! } ! @Override ! public void declareOutputFields(OutputFieldsDeclarer declarer) { ! ! declarer.declare(new Fields("sentence")); ! } @Override public void ack(Object msgId) {} @Override public void fail(Object msgId) {} }
  • 9. Bolt import backtype.storm.task.OutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichBolt; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; public class SplitSentenceBolt extends BaseRichBolt { ! OutputCollector collector; ! ! @Override ! public void prepare(Map stormConf, TopologyContext context, ! ! ! OutputCollector collector) { ! ! this.collector = collector; ! } ! @Override ! public void execute(Tuple input) { ! ! for (String s : input.getString(0).split("s")) { ! ! ! collector.emit(new Values(s)); ! ! } ! } ! @Override ! public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); ! } }
  • 10. Topology import backtype.storm.Config; import backtype.storm.LocalCluster; import backtype.storm.StormSubmitter; import backtype.storm.topology.TopologyBuilder; import backtype.storm.tuple.Fields; public class WordCountTopology { ! public static void main(String[] args) throws Exception { ! ! TopologyBuilder builder = new TopologyBuilder(); ! ! builder.setSpout("sentence", new RandomSentenceSpout(), 2); ! ! builder.setBolt("split", new SplitSentenceBolt(), 4) ! ! ! ! .shuffleGrouping("sentence") ! ! ! ! .setNumTasks(8); ! ! builder.setBolt("count", new WordCountBolt(), 6) ! ! ! ! .fieldsGrouping("split", new Fields("word")); ! ! ! ! Config config = new Config(); ! ! config.setNumWorkers(4); ! ! ! ! StormSubmitter.submitTopology("wordcount", config, builder.createTopology()); ! ! ! ! // Local testing //! ! LocalCluster cluster = new LocalCluster(); //! ! cluster.submitTopology("wordcount", config, builder.createTopology()); //! ! Thread.sleep(10000); //! ! cluster.shutdown(); ! } ! }
  • 11. Starting Topology Nimbus Thrift server ZooKeeperStormSubmitter > bin/storm jar Uploads topology JAR to Nimbus’ inbox with dependencies Submits topology configuration as JSON and structure as Thrift Copies topology JAR, configuration and structure into local file system Sets up static information for topology Makes assignment Starts topology
  • 12. Starting Topology ZooKeeper Executor Task Worker Supervisor Nimbus Thrift server Downloads topology JAR, configuration and structure Writes assignment on its node into local file system Starts worker based on the assignment Refreshes connections Makes executors Makes tasks Starts processing
  • 13. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 14. Extremely Significant Performance
  • 15. Parallelism RandomSentence Spout SplitSentence Bolt WordCount Bolt Parallelism hint = 2 Parallelism hint = 4 Parallelism hint = 6 Number of tasks = Not specified = Same as parallelism hint = 2 Number of tasks = 8 Number of tasks = Not specified = 6 Number of topology worker = 4 Number of worker slots / node = 4 Number of worker nodes = 2 Number of executor threads = 2 + 4 + 6 = 12 Number of component instances = 2 + 8 + 6 = 16 Worker node Worker node Worker process Worker process SS Bolt WC Bolt RS Spout SS Bolt SS Bolt WC Bolt RS Spout SS Bolt SS Bolt WC Bolt SS Bolt WC Bolt SS Bolt WC Bolt SS Bolt WC Bolt Executor thread Topology can be spread out manually without downtime when a worker node is added
  • 16. Message Passing Worker process Executor Executor Transfer thread Executor Receive thread From other workers To other workers Receiver queue Transfer queue Internal transfer queue Interprocess communication is mediated by ZeroMQ Outside transfer is done with Kryo serialization Local communication is mediated by LMAX Disruptor Inside transfer is done with no serialization
  • 17. LMAX Disruptor • Consumer can easily keep up with producer by batching • CPU cache friendly - The ring is implemented as an array, so the entries can be preloaded • GC safe - The entries are preallocated up front and live forever Large concurrent magic ring buffer can be used like blocking queue Producer Consumer 6 million orders per second can be processed on a single thread at LMAX
  • 18. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 19. Fault-tolerance Cluster works normally ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat
  • 20. Fault-tolerance Nimbus goes down ZooKeeper WorkerSupervisorNimbus Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Monitoring cluster state Processing will still continue. But topology lifecycle operations and reassignment facility are lost
  • 21. Fault-tolerance Worker node goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat WorkerSupervisor Nimbus will reassign the tasks to other machines and the processing will continue
  • 22. Fault-tolerance Supervisor goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Processing will still continue. But assignment is never synchronized
  • 23. Fault-tolerance Worker process goes down ZooKeeper WorkerSupervisorNimbus Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heartbeat from local file system Sending executor heartbeat Supervisor will restart the worker process and the processing will continue
  • 24. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 25. Reliability API public class RandomSentenceSpout extends BaseRichSpout { ! public void nextTuple() { ! ! ...; ! ! UUID msgId = getMsgId(); ! ! collector.emit(new Values(sentence), msgId); ! } public void ack(Object msgId) { ! // Do something with acked message id. } public void fail(Object msgId) { ! // Do something with failed message id. } } public class SplitSentenceBolt extends BaseRichBolt { ! public void execute(Tuple input) { ! ! for (String s : input.getString(0).split("s")) { ! ! ! collector.emit(input, new Values(s)); ! ! } ! ! ! ! collector.ack(input); ! } } "the" "the cow jumped over the moon" "cow" "jumped" "over" "the" "moon" Emitting tuple with message id Anchoring incoming tuple to outgoing tuples Sending ack Tuple tree
  • 26. Acking Framework SplitSentence Bolt RandomSentence Spout WordCount Bolt Acker implicit bolt Acker ack Acker fail Acker init Acker implicit bolt Tuple A Tuple C Tuple B 64 bit number called “Ack val”Spout tuple id Spout task id Ack val has become 0, Acker implicit bolt knows the tuple tree has been completed Acker ack Acker fail • Emitted tuple A, XOR tuple A id with ack val • Emitted tuple B, XOR tuple B id with ack val • Emitted tuple C, XOR tuple C id with ack val • Acked tuple A, XOR tuple A id with ack val • Acked tuple B, XOR tuple B id with ack val • Acked tuple C, XOR tuple C id with ack val
  • 27. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 28. Cluster Setup • Setup ZooKeeper cluster • Install dependencies on Nimbus and worker machines - ZeroMQ 2.1.7 and JZMQ - Java 6 and Python 2.6.6 - unzip • Download and extract a Storm release to Nimbus and worker machines • Fill in mandatory configuration into storm.yaml • Launch daemons under supervision using “storm” script
  • 29. Cluster Summary
  • 30. Topology Summary
  • 31. Component Summary
  • 32. What is Storm? Storm is • Fast & scalable • Fault-tolerant • Guarantees messages will be processed • Easy to setup & operate • Free & open source distributed realtime computation system - Originally developed by Nathan Marz at BackType (acquired by Twitter) - Written in Java and Clojure
  • 33. Basic Resources • Storm is available at - http://storm-project.net/ - https://github.com/nathanmarz/storm under Eclipse Public License 1.0 • Get help on - http://groups.google.com/group/storm-user - #storm-user freenode room • Follow - @stormprocessor and @nathanmarz for updates on the project
  • 34. Many Contributions • Community repository for modules to use Storm at - https://github.com/nathanmarz/storm-contrib including integration with Redis, Kafka, MongoDB, HBase, JMS, Amazon SQS and so on • Good articles for understanding Storm internals - http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm- topology/ - http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message- buffers/ • Good slides for understanding real-life examples - http://www.slideshare.net/DanLynn1/storm-as-deep-into-realtime-data-processing-as-you- can-get-in-30-minutes - http://www.slideshare.net/KrishnaGade2/storm-at-twitter
  • 35. Features on Deck • Current release: 0.8.2 as of 6/28/2013 • Work in progress (older): 0.8.3-wip3 - Some bug fixes • Work in progress (newest): 0.9.0-wip19 - SLF4J and Logback - Pluggable tuple serialization and blowfish encryption - Pluggable interprocess messaging and Netty implementation - Some bug fixes - And more
  • 36. Advanced Topics • Distributed RPC • Transactional topologies • Trident • Using non-JVM languages with Storm • Unit testing • Patterns ...Not described in this presentation. So check these out by yourself, or my upcoming session if a chance is given :)
  • 37. Thank You