• What is Storm? 
• Storm Benefits 
• How Storm differentiates from Hadoop 
• Storm vs. Flume 
• Storm Example using Twitter Streaming API 
• Quiz
• Storm is a Fault tolerant, distributed, real-time 
computation system. 
• It’s a Non persistent API. 
• On a Storm cluster, we basically execute topologies, 
which process streams of tuples (data). 
• Each Topology is a graph consisting of Spouts(which 
produce tuples) and bolts (which transform tuples).
• Once Storm Topology submitted, also, if all the 
computation logic written in bolts are correct, 
then it just works.
Storm Hadoop 
Distributed & fault tolerant Distributed & fault tolerant 
Real-time Computation 
system 
Batch Processing system 
Non persistent Persistent, Uses HDFS for file storage
Storm Flume 
Real-time Streaming systems Real-time Streaming systems 
Real-time Computation system Not an Real-time Computation system 
It will not Use any Message brokers for 
real-time processing of data 
It uses Channel, as a message broker 
between Source and Sink
Topology Scenario:- 
 I have taken one spout(TwitterSampleSpout) and three 
bolts(WordSplitterBolt, IgnoreWordsBolt, WordCounterBolt) in 
this project. 
 Here spout(TwitterSampleSpout) work is to download Tweets from 
Twitter and send it back to WordSplitterBolt. 
 The WordSplitterBolt work is to split the entire text into words by 
using space delimiter, and it will send those words to 
IgnoreWordsBolt. 
 The IgnoreWordsBolt work is to ignore determiners like(a, an, the.. 
etc), it just act like a filter, later it will send the final list of words to 
WordCounterBolt. There actual count will happen, in console it will 
show top counted list of words. Just works like a Twitter trends. 
 This process will continue forever and aggregate all the list of words 
and find its count.
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration

Apache Storm and twitter Streaming API integration

  • 3.
    • What isStorm? • Storm Benefits • How Storm differentiates from Hadoop • Storm vs. Flume • Storm Example using Twitter Streaming API • Quiz
  • 4.
    • Storm isa Fault tolerant, distributed, real-time computation system. • It’s a Non persistent API. • On a Storm cluster, we basically execute topologies, which process streams of tuples (data). • Each Topology is a graph consisting of Spouts(which produce tuples) and bolts (which transform tuples).
  • 5.
    • Once StormTopology submitted, also, if all the computation logic written in bolts are correct, then it just works.
  • 6.
    Storm Hadoop Distributed& fault tolerant Distributed & fault tolerant Real-time Computation system Batch Processing system Non persistent Persistent, Uses HDFS for file storage
  • 7.
    Storm Flume Real-timeStreaming systems Real-time Streaming systems Real-time Computation system Not an Real-time Computation system It will not Use any Message brokers for real-time processing of data It uses Channel, as a message broker between Source and Sink
  • 8.
    Topology Scenario:- I have taken one spout(TwitterSampleSpout) and three bolts(WordSplitterBolt, IgnoreWordsBolt, WordCounterBolt) in this project.  Here spout(TwitterSampleSpout) work is to download Tweets from Twitter and send it back to WordSplitterBolt.  The WordSplitterBolt work is to split the entire text into words by using space delimiter, and it will send those words to IgnoreWordsBolt.  The IgnoreWordsBolt work is to ignore determiners like(a, an, the.. etc), it just act like a filter, later it will send the final list of words to WordCounterBolt. There actual count will happen, in console it will show top counted list of words. Just works like a Twitter trends.  This process will continue forever and aggregate all the list of words and find its count.