Realtime Computation with Storm

  • 1,723 views
Uploaded on

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime …

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use!

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,723
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
109
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • C - Best accessible distributed realtime computation system going\nA - Learn about and start using Storm\nB - You will get a great new tool in your technology stack - interesting uses\n
  • CEP - continuous\n\nNot HFT-grade\n\n
  • \n
  • scaling is painful\npoor fault tolerance\ncoding is hard\n
  • \n
  • \n
  • tweets stock ticks manufacturing machine data sensor messages\n
  • \n
  • \n
  • \n
  • \n
  • DAG\n\nruns continuously\n
  • abstractions like Cascading, Hive, Pig make MR approachable\n\ncode size reduction\n
  • \n
  • \n
  • kestrel - via thrift\nkafka - transactional topologies, idempotentcy, process only once\nactivemq\n
  • \n
  • current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n
  • new architecture\n
  • \n
  • Trending Topics (stream processing of the firehose)\ncomputing the ‘reach’ of a URL (Dist RPC)\n
  • \n
  • Android devices, sampling geo every 5 seconds\nroute optimization\nroad tax reduction\nidle alerts\n
  • C - Exciting times, much like Hadoop/NoSQL beginning\nA - Start tinkering with Storm, integrate into your workflows\nB - be more responsive in turning data into information\n

Transcript

  • 1. Realtime Computation with Storm Brad Anderson banderson@maprtech.com @boorad
  • 2. Definition & Overview Interoperability Use Cases
  • 3. Stream Processing CEP Distributed RPC
  • 4. Before StormQueues Workers
  • 5. Example (simplified)
  • 6. StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing
  • 7. Concepts
  • 8. streamsTuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
  • 9. spoutsSource of streams
  • 10. spoutspublic interface ISpout extends Serializable { void open(Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId);}
  • 11. boltsProcesses input streams and produces new streams
  • 12. boltspublic class DoubleAndTripleBolt extends BaseRichBolt { private OutputCollectorBase _collector; public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) { _collector = collector; } public void execute(Tuple input) { int val = input.getInteger(0); _collector.emit(input, new Values(val*2, val*3)); _collector.ack(input); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); }}
  • 13. topologiesNetwork of spouts and bolts
  • 14. TridentCascading for Storm
  • 15. TridentTopology topology = new TridentTopology();TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) .parallelismHint(6);
  • 16. Interoperability
  • 17. spouts•Kafka (with transactions)• Kestrel• JMS• AMQP• Beanstalkd
  • 18. bolts• Functions• Filters• Aggregation• Joins• Talk to databases, Hadoop write-behind
  • 19. Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
  • 20. Storm realtime processes Queue AppsRawData Business Value Hadoop Parallel Cluster Ingest batch processes
  • 21. Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
  • 22. Storm realtime processes AppsRawData Business Value Hadoop batch processes
  • 23. Use Cases
  • 24. Twitter Follower Distinct Tweeter Follower follower Follower Distinct URL Tweeter follower Reach Follower Follower Distinct Tweeter follower Follower
  • 25. Heartbyte
  • 26. Fleet Logistics
  • 27. http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad
  • 28. Thank you.http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad