Storm

Course Instructor : Dr.Zarifzadeh
Presented By : Pouyan Rezazadeh, Ali Rezaie
Apache Storm

2
Introduction
Apache Storm
Hadoop and related technologies have made it
possible to store and process data at large scales.
Unfortunately, these data processing technologies
are not realtime systems.
Hadoop does batch processing instead of realtime
processing.

3
Introduction
Batch processing
Processing jobs in batch
Batch processing jobs can take hours
E.g. billing system
Realtime processing
Processing jobs one by one
Processing jobs immediately
E.g. airline system
Apache Storm

4
Introduction
Realtime data processing at massive scale is
becoming more and more of a requirement for
businesses.
The lack of a "Hadoop of realtime" has become
the biggest hole in the data processing ecosystem.
There's no hack that will turn Hadoop into a
realtime system.
Solution
Apache Storm

5
Apache Storm
A distributed realtime computation system
Founded in 2011
Implemented in Clojure (a dialect of Lisp), some
Java
Apache Storm

6
Advantages
Free, simple and open source
Can be used with any programming language
Very fast
Scalable
Fault-tolerant
Guarantees your data will be processed
Integrates with any database technology
Extremely robust
Apache Storm

7
Storm Use Cases
Apache Storm
And too many others …

8
Storm vs Hadoop
A Storm cluster is superficially similar to a
Hadoop cluster.
Hadoop runs "MapReduce jobs", while Storm
runs "topologies".
A MapReduce job eventually finishes, whereas a
topology processes messages forever (or until you
kill it).
Apache Storm

9
Spouts and Bolts
Apache Storm
Spouts Bolts

10
Spouts and Bolts
A stream is an unbounded sequence of tuples.
A spout is a source of streams.
Apache Storm
Spout 2 Bolt 3
Bolt 2
Bolt 4
Bolt 1
Spout 1

11
Spouts and Bolts
For example, a spout may read tuples off of
a queue and emit them as a stream.
Apache Storm
Spout 2 Bolt 3
Bolt 2
Bolt 4
Bolt 1
Spout 1

12
Spouts and Bolts
A bolt consumes any number of input streams,
does some processing, and possibly emits new
streams.
Apache Storm
Spout 2 Bolt 3
Bolt 2
Bolt 4
Bolt 1
Spout 1

13
Spouts and Bolts
Each node (spout or bolt) in a Storm topology
executes in parallel.
Apache Storm
Spout 2 Bolt 3
Bolt 2
Bolt 4
Bolt 1
Spout 1

14
Architecture
Apache Storm
A machine in a storm cluster may
run one or more worker processes.
Each topology has one or more
worker processes.
Each worker process runs
executors (threads) for a specific
topology.
Each executor runs one or more
tasks of the same component(spout
or bolt).
Worker Process
Task
Task
Task
Task
executor

15
Architecture
Apache Storm
Supervisor
Nimbus
ZooKeeper
ZooKeeper
ZooKeeper
Supervisor
Supervisor
Supervisor
Supervisor
Hadoop v1 Storm
JobTracker Nimbus
(only 1)
 distributes code around cluster
 assigns tasks to machines/supervisors
 failure monitoring
TaskTracker Supervisor
(many)
 listens for work assigned to its machine
 starts and stops worker processes as necessary based on Nimbus
ZooKeeper  coordination between Nimbus and the Supervisors

16
Architecture
The Nimbus and Supervisor are stateless.
All state is kept in Zookeeper.
1 ZK instance per machine
When the Nimbus or Supervisor fails, they'll start
back up like nothing happened.
Apache Storm
storm jar all-my-code.jar org.apache.storm.MyTopology arg1 arg2

17
Architecture
A running topology consists of many worker
processes spread across many machines.
Apache Storm
Topology
Worker Process
Task
Task
Task
Task
TaskTask
Worker Process
Task
Task
Task
Task
TaskTask

18
Topology With
Tasks in Details
Apache Storm

19
Stream Groupings
Shuffle grouping: Randomized
round-robin
Fields grouping: all Tuples
with the same field value(s) are
always routed to the same task
Direct grouping: producer of
the tuple decides which task of
the consumer will receive the
tuple
Apache Storm

20
A Sample Code of
Configuring
Apache Storm
TopologyBuilder topologyBuilder = new TopologyBuilder();

21
Fault Tolerance
Apache Storm
Workers heartbeat back to Nimbus via ZooKeeper.

22
Fault Tolerance
Apache Storm
When a worker dies, the supervisor will restart it.

23
Fault Tolerance
Apache Storm
If it continuously fails on startup and is unable to
heartbeat to Nimbus, Nimbus will reschedule the worker.

24
Fault Tolerance
Apache Storm
If a supervisor node dies, Nimbus will reassign the work
to other nodes.

25
Fault Tolerance
Apache Storm
If Nimbus dies, topologies will continue to function
normally! but won’t be able to perform reassignments.

26
Fault Tolerance
Apache Storm
In contrast to Hadoop, where if the JobTracker
dies, all the running jobs are lost.

27
Fault Tolerance
Apache Storm
Preferably run ZK with nodes >= 3 so that you
can tolerate the failure of 1 ZK server.

28
A Sample Word
Count Topology
Sentence Spout:
Split Sentence Bolt:
Word Count Bolt:
Report Bolt: prints the contents
Apache Storm
{ "sentence": "my dog has fleas" }
{ "word" : "my" }
{ "word" : "dog" }
{ "word" : "has" }
{ "word" : "fleas" }
{ "word" : "dog", "count" : 5 }
Sentence
Spout
Split
Sentence
Bolt
Word
Count
Bolt
Report
Bolt

29
A Sample Word
Count Code
Apache Storm
public class SentenceSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private String[] sentences = {
"my dog has fleas", "i like cold beverages", "the dog ate my
homework", "don't have a cow man", "i don't think i like fleas“
};
private int index = 0;
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
public void open(Map config, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
}
public void nextTuple() {
this.collector.emit(new Values(sentences[index]));
index++;
if (index >= sentences.length) { index = 0; }
}
}

30
A Sample Word
Count Code
Apache Storm
public class SplitSentenceBolt extends BaseRichBolt{
private OutputCollector collector;
public void prepare(Map config, TopologyContext context, OutputCollector
collector) {
}
public void execute(Tuple tuple) {
String sentence = tuple.getStringByField("sentence");
String[] words = sentence.split(" ");
for(String word : words){
this.collector.emit(new Values(word));
}
}
declarer.declare(new Fields("word"));
}
}

31
A Sample Word
Count Code
Apache Storm
public class WordCountBolt extends BaseRichBolt{
private OutputCollector collector;
private HashMap<String, Long> counts = null;
public void prepare(Map config, TopologyContext context, OutputCollector collector) {
this.counts = new HashMap<String, Long>();
}
String word = tuple.getStringByField("word");
Long count = this.counts.get(word);
if(count == null){
count = 0L;
}
count++;
this.counts.put(word, count);
this.collector.emit(new Values(word, count));
}
declarer.declare(new Fields("word", "count"));
}
}

32
A Sample Word
Count Code
Apache Storm
public class ReportBolt extends BaseRichBolt {
private HashMap<String, Long> counts = null;
public void prepare(Map config, TopologyContext context, OutputCollector collector) {
this.counts = new HashMap<String, Long>();
}
String word = tuple.getStringByField("word");
Long count = tuple.getLongByField("count");
this.counts.put(word, count);
}
// this bolt does not emit anything }
public void cleanup() {
List<String> keys = new ArrayList<String>();
keys.addAll(this.counts.keySet());
Collections.sort(keys);
for (String key : keys) {
System.out.println(key + " : " + this.counts.get(key));
}
}
}

Storm

More Related Content

What's hot

Similar to Storm

Recently uploaded

Storm