Your SlideShare is downloading. ×
0
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Streams processing with Storm
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Streams processing with Storm

6,191

Published on

Published in: Technology
0 Comments
17 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,191
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
473
Comments
0
Likes
17
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mariusz Gil Data streams processing with STORM
  • 2. data expire fast. very fast
  • 3. realtime processing?
  • 4. Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
  • 5. Storm is fast, a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
  • 6. concept architecture
  • 7. tuple tuple tuple tuple tuple tuple tuple (val1, val2) (val3, val4) (val5, val6) Stream unbounded sequence of tuples
  • 8. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams
  • 9. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Reliable and unreliable Spouts replay or forget about touple
  • 10. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams Storm-Kafka
  • 11. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams Storm-Kestrel
  • 12. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams Storm-AMQP-Spout
  • 13. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams Storm-JMS
  • 14. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams Storm-PubSub*
  • 15. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Spouts tuple tuple source of streams Storm-Beanstalkd-Spout
  • 16. tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple tuple Bolts process input streams and produce new streams
  • 17. tuple tuple tuple tuple tuple tuple tuple tuple tuple le tuple tup le le tup tup le le tup tuple process input streams and produce new streams tuple tuple tuple tuple Bolts tuple tuple tuple tup tup le le tup tuple tuple tuple
  • 18. TextSpout [word, count] [word] [sentence] SplitSentenceBolt Topologies WordCountBolt network of spouts and bolts
  • 19. [sentence] [word, count] TextSpout SplitSentenceBolt [word] [sentence] WordCountBolt TextSpout SplitSentenceBolt Topologies network of spouts and bolts xyzBolt
  • 20. servers architecture
  • 21. Nimbus process responsible for distributing processing across the cluster
  • 22. Supervisors worker process responsible for executing subset of topology
  • 23. zookeepers coordination layer between Nimbus and Supervisors
  • 24. fast il a f CLUSTER STATE IS STORED LOCALLY OR IN ZOOKEEPERS
  • 25. sample code
  • 26. public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } @Override public void nextTuple() { Utils.sleep(100); String[] sentences = new String[] { "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature"}; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } @Override public void ack(Object id) { } @Override public void fail(Object id) { } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } Spouts }
  • 27. public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); @Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } } Bolts
  • 28. public static class ExclamationBolt implements IRichBolt { OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } public void cleanup() { } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public Map getComponentConfiguration() { return null; } } Bolts
  • 29. public class WordCountTopology { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000); cluster.shutdown(); } } } Topology
  • 30. public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } import storm class SplitSentenceBolt(storm.BasicBolt): def process(self, tup): words = tup.values[0].split(" ") for word in words: storm.emit([word]) SplitSentenceBolt().run() Bolts
  • 31. github.com/nathanmarz/storm-starter
  • 32. streams groupping
  • 33. public class WordCountTopology { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); conf.setDebug(true); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000); cluster.shutdown(); } } } Topology
  • 34. Groupping shuffle, fields, all, global, none, direct, local or shuffle
  • 35. distributed rpc
  • 36. [request-id, results] results arguments RPC distributed [request-id, arguments]
  • 37. public static class ExclaimBolt extends BaseBasicBolt { public void execute(Tuple tuple, BasicOutputCollector collector) { [request-id, results] String input = tuple.getString(1); collector.emit(new Values(tuple.getValue(0), input + "!")); } RPC public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id", "result")); } results } public static void main(String[] args) throws Exception { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("exclamation"); builder.addBolt(new ExclaimBolt(), 3); arguments LocalDRPC drpc = new LocalDRPC(); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("drpc-demo", conf, builder.createLocalTopology(drpc)); distributed System.out.println("Results for 'hello':" + drpc.execute("exclamation", "hello")); cluster.shutdown(); drpc.shutdown(); } [request-id, arguments]
  • 38. realtime analytics personalization search revenue optimization monitoring
  • 39. content search realtime analytics generating feeds integrated with elastic search, Hbase,hadoop and hdfs
  • 40. realtime scoring moments generation integrated with kafka queues and hdfs storage
  • 41. Storm-YARN enables Storm applications to utilize the computational resources in a Hadoop cluster along with accessing Hadoop storage resources such As HBase and HDFS
  • 42. thanks! mail: mariusz@mariuszgil.pl twitter: @mariuszgil

×