Storm is a distributed real-time computation system that provides guaranteed message processing, horizontal scalability, and fault tolerance. It allows users to define data processing topologies and submit them to a Storm cluster for distributed execution. Spouts emit streams of tuples that are processed by bolts. Storm tracks processing to ensure reliability and replays failed tasks. It provides tools for deployment, monitoring, and optimization of real-time data processing.
35. Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: consistent hashing on a
subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
75. What Storm does
• Distributes code and configurations
• Robust process management
• Monitors topologies and reassigns failed tasks
• Provides reliability by tracking tuple trees
• Routing and partitioning of streams
• Serialization
• Fine-grained performance stats of topologies