Short introduction to Storm

507 views
431 views

Published on

Presentation given in class for Cloud Computing at Universitat Politècnica de Catalunya

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
507
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Short introduction to Storm

  1. 1. STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03
  2. 2. INTRODUCTION• Like Hadoop for realtime processing instead of batch• Open Source• Developed by BackType which was later acquired byTwitter• Developed for analyzingTwitter data• Similar to S4
  3. 3. STORMTOPOLOGY
  4. 4. SPOUTS
  5. 5. SPOUTS• The component responsible for feeding messages into thetopology• Emits tuples• Can be reliable or unreliable (ack() and fail())
  6. 6. INTEGRATION• Kestrel• RabbitMQ• Kafka• JMS• Integration is easy with the simple Spout abstraction
  7. 7. BOLTS
  8. 8. BOLTS• A component that takes tuples as input and produces tuplesas output• Can do filtering, joining, functions, aggregations etc.• Does not have to process a tuple immediately and may holdonto tuples to process later• Comparison with Hadoop:A bolt can be a mapper or a reducer (or anything)
  9. 9. STORMTOPOLOGY
  10. 10. STORMTOPOLOGY• Spouts, bolts and streams• Distributed• Runs indefinitely until it is stopped• Arbitrary complexity• Streams requiring multiple steps also requires multiple bolts• No intermediate queues for streams
  11. 11. FAULT-TOLERANCE• Nimbus daemon and Supervisordaemons are fail-fast and stateless• Each worker sends heartbeats to Nimbus• Transactional topologies → Guaranteed processingNimbusZookeeperSupervisorSupervisorSupervisorSupervisorZookeeper
  12. 12. USE CASES• Counting words!• Realtime analytics - trending topics onTwitter• Online machine learning• Continuous computation• Distributed RPC• Extract,Transform and Load (ETL)
  13. 13. FASTOne benchmark clocked it overa million tuples processedper second per node{x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠
  14. 14. STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03

×