0
STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03
INTRODUCTION• Like Hadoop for realtime processing instead of batch• Open Source• Developed by BackType which was later acq...
STORMTOPOLOGY
SPOUTS
SPOUTS• The component responsible for feeding messages into thetopology• Emits tuples• Can be reliable or unreliable (ack(...
INTEGRATION• Kestrel• RabbitMQ• Kafka• JMS• Integration is easy with the simple Spout abstraction
BOLTS
BOLTS• A component that takes tuples as input and produces tuplesas output• Can do filtering, joining, functions, aggregati...
STORMTOPOLOGY
STORMTOPOLOGY• Spouts, bolts and streams• Distributed• Runs indefinitely until it is stopped• Arbitrary complexity• Streams...
FAULT-TOLERANCE• Nimbus daemon and Supervisordaemons are fail-fast and stateless• Each worker sends heartbeats to Nimbus• ...
USE CASES• Counting words!• Realtime analytics - trending topics onTwitter• Online machine learning• Continuous computatio...
FASTOne benchmark clocked it overa million tuples processedper second per node{x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y...
STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03
Upcoming SlideShare
Loading in...5
×

Short introduction to Storm

346

Published on

Presentation given in class for Cloud Computing at Universitat Politècnica de Catalunya

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
346
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Short introduction to Storm"

  1. 1. STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03
  2. 2. INTRODUCTION• Like Hadoop for realtime processing instead of batch• Open Source• Developed by BackType which was later acquired byTwitter• Developed for analyzingTwitter data• Similar to S4
  3. 3. STORMTOPOLOGY
  4. 4. SPOUTS
  5. 5. SPOUTS• The component responsible for feeding messages into thetopology• Emits tuples• Can be reliable or unreliable (ack() and fail())
  6. 6. INTEGRATION• Kestrel• RabbitMQ• Kafka• JMS• Integration is easy with the simple Spout abstraction
  7. 7. BOLTS
  8. 8. BOLTS• A component that takes tuples as input and produces tuplesas output• Can do filtering, joining, functions, aggregations etc.• Does not have to process a tuple immediately and may holdonto tuples to process later• Comparison with Hadoop:A bolt can be a mapper or a reducer (or anything)
  9. 9. STORMTOPOLOGY
  10. 10. STORMTOPOLOGY• Spouts, bolts and streams• Distributed• Runs indefinitely until it is stopped• Arbitrary complexity• Streams requiring multiple steps also requires multiple bolts• No intermediate queues for streams
  11. 11. FAULT-TOLERANCE• Nimbus daemon and Supervisordaemons are fail-fast and stateless• Each worker sends heartbeats to Nimbus• Transactional topologies → Guaranteed processingNimbusZookeeperSupervisorSupervisorSupervisorSupervisorZookeeper
  12. 12. USE CASES• Counting words!• Realtime analytics - trending topics onTwitter• Online machine learning• Continuous computation• Distributed RPC• Extract,Transform and Load (ETL)
  13. 13. FASTOne benchmark clocked it overa million tuples processedper second per node{x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠
  14. 14. STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×