Realtime Computation      with Storm                    Brad Anderson          banderson@maprtech.com                     ...
Definition & Overview   Interoperability     Use Cases
Stream Processing       CEP Distributed RPC
Before StormQueues        Workers
Example (simplified)
StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstracti...
Concepts
streamsTuple   Tuple      Tuple    Tuple    Tuple     Tuple   Tuple                Unbounded sequence of tuples
spoutsSource of streams
spoutspublic interface ISpout extends Serializable {  void open(Map conf,         TopologyContext context,         SpoutOu...
boltsProcesses input streams and produces new streams
boltspublic class DoubleAndTripleBolt extends BaseRichBolt {  private OutputCollectorBase _collector;    public void prepa...
topologiesNetwork of spouts and bolts
TridentCascading for Storm
TridentTopology topology = new TridentTopology();TridentState wordCounts =   topology.newStream("spout1", spout)    .each(...
Interoperability
spouts•Kafka (with transactions)• Kestrel• JMS• AMQP• Beanstalkd
bolts• Functions• Filters• Aggregation• Joins• Talk to databases, Hadoop write-behind
Storm                realtime               processes       Queue                               AppsRawData               ...
Storm                       realtime                      processes              Queue                               AppsR...
Storm                realtime               processes       Queue                AppsRawData                       Busines...
Storm        realtime       processes                    AppsRawData               Business                    Value      ...
Use Cases
Twitter                  Follower                             Distinct        Tweeter   Follower   follower               ...
Heartbyte
Fleet Logistics
http://github.com/{tdunning | boorad}/mapr-spout                                    Brad Anderson                         ...
Thank you.http://github.com/{tdunning | boorad}/mapr-spout                                    Brad Anderson               ...
Realtime Computation with Storm
Upcoming SlideShare
Loading in...5
×

Realtime Computation with Storm

1,855

Published on

Storm is a distributed realtime computation system. Similar to how Hadoop provides a set of general primitives for doing batch processing, Storm provides a set of general primitives for doing realtime computation. Storm is simple, can be used with any programming language, and is a lot of fun to use!

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,855
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
116
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • \n
  • C - Best accessible distributed realtime computation system going\nA - Learn about and start using Storm\nB - You will get a great new tool in your technology stack - interesting uses\n
  • CEP - continuous\n\nNot HFT-grade\n\n
  • \n
  • scaling is painful\npoor fault tolerance\ncoding is hard\n
  • \n
  • \n
  • tweets stock ticks manufacturing machine data sensor messages\n
  • \n
  • \n
  • \n
  • \n
  • DAG\n\nruns continuously\n
  • abstractions like Cascading, Hive, Pig make MR approachable\n\ncode size reduction\n
  • \n
  • \n
  • kestrel - via thrift\nkafka - transactional topologies, idempotentcy, process only once\nactivemq\n
  • \n
  • current architecture\n\ndata ingest tool for hadoop (avoid Flume madness)\n
  • new architecture\n
  • \n
  • Trending Topics (stream processing of the firehose)\ncomputing the ‘reach’ of a URL (Dist RPC)\n
  • \n
  • Android devices, sampling geo every 5 seconds\nroute optimization\nroad tax reduction\nidle alerts\n
  • C - Exciting times, much like Hadoop/NoSQL beginning\nA - Start tinkering with Storm, integrate into your workflows\nB - be more responsive in turning data into information\n
  • Realtime Computation with Storm

    1. 1. Realtime Computation with Storm Brad Anderson banderson@maprtech.com @boorad
    2. 2. Definition & Overview Interoperability Use Cases
    3. 3. Stream Processing CEP Distributed RPC
    4. 4. Before StormQueues Workers
    5. 5. Example (simplified)
    6. 6. StormGuaranteed data processingHorizontal scalabilityFault-toleranceNo intermediate message brokers!Higher level abstraction than message passing
    7. 7. Concepts
    8. 8. streamsTuple Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
    9. 9. spoutsSource of streams
    10. 10. spoutspublic interface ISpout extends Serializable { void open(Map conf, TopologyContext context, SpoutOutputCollector collector); void close(); void nextTuple(); void ack(Object msgId); void fail(Object msgId);}
    11. 11. boltsProcesses input streams and produces new streams
    12. 12. boltspublic class DoubleAndTripleBolt extends BaseRichBolt { private OutputCollectorBase _collector; public void prepare(Map conf, TopologyContext context, OutputCollectorBase collector) { _collector = collector; } public void execute(Tuple input) { int val = input.getInteger(0); _collector.emit(input, new Values(val*2, val*3)); _collector.ack(input); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("double", "triple")); }}
    13. 13. topologiesNetwork of spouts and bolts
    14. 14. TridentCascading for Storm
    15. 15. TridentTopology topology = new TridentTopology();TridentState wordCounts = topology.newStream("spout1", spout) .each(new Fields("sentence"), new Split(), new Fields("word")) .groupBy(new Fields("word")) .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count")) .parallelismHint(6);
    16. 16. Interoperability
    17. 17. spouts•Kafka (with transactions)• Kestrel• JMS• AMQP• Beanstalkd
    18. 18. bolts• Functions• Filters• Aggregation• Joins• Talk to databases, Hadoop write-behind
    19. 19. Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
    20. 20. Storm realtime processes Queue AppsRawData Business Value Hadoop Parallel Cluster Ingest batch processes
    21. 21. Storm realtime processes Queue AppsRawData Business Value Hadoop batch processes
    22. 22. Storm realtime processes AppsRawData Business Value Hadoop batch processes
    23. 23. Use Cases
    24. 24. Twitter Follower Distinct Tweeter Follower follower Follower Distinct URL Tweeter follower Reach Follower Follower Distinct Tweeter follower Follower
    25. 25. Heartbyte
    26. 26. Fleet Logistics
    27. 27. http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad
    28. 28. Thank you.http://github.com/{tdunning | boorad}/mapr-spout Brad Anderson banderson@maprtech.com @boorad
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×