• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Short introduction to Storm
 

Short introduction to Storm

on

  • 652 views

Presentation given in class for Cloud Computing at Universitat Politècnica de Catalunya

Presentation given in class for Cloud Computing at Universitat Politècnica de Catalunya

Statistics

Views

Total Views
652
Views on SlideShare
652
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Short introduction to Storm Short introduction to Storm Presentation Transcript

    • STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03
    • INTRODUCTION• Like Hadoop for realtime processing instead of batch• Open Source• Developed by BackType which was later acquired byTwitter• Developed for analyzingTwitter data• Similar to S4
    • STORMTOPOLOGY
    • SPOUTS
    • SPOUTS• The component responsible for feeding messages into thetopology• Emits tuples• Can be reliable or unreliable (ack() and fail())
    • INTEGRATION• Kestrel• RabbitMQ• Kafka• JMS• Integration is easy with the simple Spout abstraction
    • BOLTS
    • BOLTS• A component that takes tuples as input and produces tuplesas output• Can do filtering, joining, functions, aggregations etc.• Does not have to process a tuple immediately and may holdonto tuples to process later• Comparison with Hadoop:A bolt can be a mapper or a reducer (or anything)
    • STORMTOPOLOGY
    • STORMTOPOLOGY• Spouts, bolts and streams• Distributed• Runs indefinitely until it is stopped• Arbitrary complexity• Streams requiring multiple steps also requires multiple bolts• No intermediate queues for streams
    • FAULT-TOLERANCE• Nimbus daemon and Supervisordaemons are fail-fast and stateless• Each worker sends heartbeats to Nimbus• Transactional topologies → Guaranteed processingNimbusZookeeperSupervisorSupervisorSupervisorSupervisorZookeeper
    • USE CASES• Counting words!• Realtime analytics - trending topics onTwitter• Online machine learning• Continuous computation• Distributed RPC• Extract,Transform and Load (ETL)
    • FASTOne benchmark clocked it overa million tuples processedper second per node{x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠ {x,y,z} ↠
    • STORMDISTRIBUTED AND FAULT-TOLERANTREALTIME COMPUTATIONJimmy ZögerCLC < FIB < UPC2013-06-03