Realtime Computation with Storm

2,392 views

Published on

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use! We will talk about how Storm is architected, how to interoperate with Hadoop, and a few real-world use-cases.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,392
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
73
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Realtime Computation with Storm

  1. 1. Realtime Computation with Storm Brad Anderson banderson@maprtech.com @boorad
  2. 2. Definition & Overview Interoperability Use Cases
  3. 3. Stream Processing CEP Distributed RPC
  4. 4. Source Data• Social Media • Weather Data Feeds • Auctions of Ad• Network Sensors Impressions• App/Web Logs • Payment• Stock Tick Data Transactions
  5. 5. Before StormQueues Workers
  6. 6. Example (simplified)
  7. 7. StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing“Just works”
  8. 8. Concepts
  9. 9. streamsTuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
  10. 10. spoutsSource of streams
  11. 11. spoutspublic  interface  ISpout  extends  Serializable  {        void  open(Map  conf,                            TopologyContext  context,                            SpoutOutputCollector  collector);        void  close();        void  nextTuple();        void  ack(Object  msgId);        void  fail(Object  msgId);}
  12. 12. boltsProcesses input streams and produces new streams
  13. 13. boltspublic  class  DoubleAndTripleBolt  extends  BaseRichBolt  {        private  OutputCollectorBase  _collector;        public  void  prepare(Map  conf,                                                TopologyContext  context,                                                OutputCollectorBase  collector)  {                _collector  =  collector;        }        public  void  execute(Tuple  input)  {                int  val  =  input.getInteger(0);                                _collector.emit(input,  new  Values(val*2,  val*3));                _collector.ack(input);        }        public  void  declareOutputFields(OutputFieldsDeclarer  declarer)  {                declarer.declare(new  Fields("double",  "triple"));        }        }
  14. 14. topologiesNetwork of spouts and bolts
  15. 15. topologies        TopologyBuilder builder = new TopologyBuilder();                builder.setSpout("spout", new RandomSentenceSpout(), 5);                builder.setBolt("split", new SplitSentence(), 8)                 .shuffleGrouping("spout");        builder.setBolt("count", new WordCount(), 12)                 .fieldsGrouping("split", new Fields("word"));
  16. 16. TridentCascading for Storm
  17. 17. Trident Facilities• Joins• Aggregations• Grouping• Functions• Filters• Consistent, Exactly-Once Semantics
  18. 18. TridentTopology  topology  =  new  TridentTopology();                TridentState  wordCounts  =          topology.newStream("spout1",  spout)              .each(new  Fields("sentence"),  new  Split(),  new  Fields("word"))              .groupBy(new  Fields("word"))              .persistentAggregate(new  MemoryMapState.Factory(),                                                        new  Count(),                                                        new  Fields("count"))                                              .parallelismHint(6);
  19. 19. Interoperability
  20. 20. spouts•Kafka (with transactions)• Kestrel• JMS• AMQP• Beanstalkd
  21. 21. bolts• Functions• Filters• Aggregation• Joins• Talk to databases, Hadoop write-behind
  22. 22. Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
  23. 23. Storm realtime processes Queue AppsRawData Business Value Hadoop Parallel Cluster Ingest batch processes
  24. 24. Storm realtime processes Apps Queue TailSpoutRawData Business Franz Value Hadoop batch processes
  25. 25. Storm realtime processes Apps TailSpoutRawData Business Franz Value Hadoop batch processes
  26. 26. Use Cases
  27. 27. Twitter Follower Distinct Tweeter Follower follower Follower Distinct URL Tweeter follower Reach Follower Follower Distinct Tweeter follower Follower
  28. 28. Heartbyte
  29. 29. Fleet Logistics
  30. 30. http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad
  31. 31. Thank you.http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad

×