STORM ANATOMY
Cloud Computing Course
Prof Hanku Lee
Social Media Cloud Computing lab
MSAkhmedov Khumoyun
What is Stream processing
 Stream processing is a technical paradigm to process big volume of
unbound sequence of tuples in realtime
= stream
Source Stream Processor
• Continuous analytics
• Online machine learning
• Sensor data monitoring
• Financial trading …
Storm at Twitter
Twitter Web Analytics
What is Storm?
Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributed realtime computation system
- Originally developed by Nathan Marz at
BackType (acquired by Twitter)
- Written in Java and Clojure
Conceptual View
Physical View
Concepts
 Streams
 Spouts
 Bolts
 Topologies
Streams
Unbounded sequence of tuples
Spouts
Source of streams
• Read from Kafka queue
• Read from Twitter Streaming API
Bolts
Processes input streams and produces new streams
Bolts
• Functions
• Filters
• Aggregation
• Joins
• Talk to databases
Topology
Network of spouts and bolts
Tasks
Spouts and bolts execute as
many tasks across the cluster
Stream grouping
When a tuple is emitted, which task does it go to?
Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: consistent hashing on a
subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
Starting topology
Starting topology
Storm : Fault-tolerance
Storm : Fault-tolerance
Storm : Fault-tolerance
Storm : Fault-tolerance
Storm : Fault-tolerance
Guarantees messages
will be processed
Message Passing (ZeroMQ)
Easy to setup & operate
• Setup ZooKeeper cluster
• Install dependencies on Nimbus and worker
machines
- ZeroMQ 2.1.7 and JZMQ
- Java 6 and Python 2.6.6
- unzip
• Download and extract a Storm release to Nimbus
and worker machines
• Fill in mandatory configuration into storm.yaml
• Launch daemons under supervision using “storm”
script
Cluster Summary
Topology Summary
Component Summary
Advanced Topics
• Distributed RPC
• Transactional topologies
• Trident
• Using non-JVM languages with Storm
• Unit testing
• Patterns
Real-time TwitterAnalytics
Trending Topics and SentimentAnalysis
Twitter
MySQL
Kafka
Storm Cluster
Hadoop (HDFS and HBase )
Twitter Crawler
THANK YOU FOR ATTENTION
Any Questions AreWelcome…

Apache Storm Internals