storm-170531123446.dotx.pptx

Apache Storm
Course Instructor : Dr.Zarifzadeh
Presented By : Pouyan Rezazadeh, Ali Rezaie

Introduction
Hadoop and related technologies have made it
possible
to store and process data at large scales.
Unfortunately, these data processing
technologies are
not realtime systems.
Hadoop does batch processing instead of
realtime
processing.
Apache Storm 2

Processing jobs one by
one
Apache Storm 3

Introduction
Batch
processing
Processing jobs in batch
Batch processing jobs can take
hours
E.g. billing system
Realtime
processing
Processing jobs
immediately
Apache Storm
4

Introduction
E.g. airline
system
Realtime data processing at massive scale is
becoming
more and more of a requirement for businesses.
The lack of a "Hadoop of realtime" has become
the
biggest hole in the data processing ecosystem.
There's no hack that will turn Hadoop into a
realtime
system.
Apache Storm 5

Apache Storm
Solution
A distributed realtime computation
system
Founded in 2011
Apache Storm 6

Implemented in Clojure (a dialect of Lisp), some
Java
Apache Storm 7

Advantages
Free, simple and open source
Can be used with any programming
language
Very fast
Scalabl
e
Fault -
tolerant
Guarantees your data will be
processed
Integrates with any database
technology
Apache Storm 8

Storm Use Cases
And too many others
…
Apache Storm 9

Storm vs Hadoop
A Storm cluster is superficially similar to a
Hadoop
cluster.
Hadoop runs "MapReduce jobs", while Storm
runs
"topologies".
Apache Storm 10

A MapReduce job eventually finishes,
whereas a
topology processes messages forever (or until
you kill
it).
Spouts and Bolts
Spout
s
Bolts
Apache Storm 11

Spouts and Bolts
Bolt 1
Bolt 4
Spout 1 Bolt 2
Spout 2 Bolt 3
A stream is an unbounded sequence of
tuples.
A spout is a source of streams.
Apache Storm 13

Spouts and Bolts
Bolt 1
Bolt 4
Spout 1 Bolt 2
Spout 2 Bolt 3
For example, a spout may read tuples off of a
queue and
emit them as a stream.
Apache Storm 14

Spouts and Bolts
Bolt 1
Bolt 4
Spout 1 Bolt 2
Spout 2 Bolt 3
A bolt consumes any number of input streams,
does
some processing, and possibly emits new streams.
Apache Storm 15

Spouts and Bolts
Bolt 1
Bolt 4
Spout 1 Bolt 2
Spout 2 Bolt 3
Each node (spout or bolt) in a Storm topology
executes
in parallel.
Apache Storm 16

Architecture
A machine in a storm cluster may run one or more
worker
processes. Worker Process
Each topology has one or more Task Task
worker
processes.
Each worker process
runs
Task
Task
executors (threads) for a specific
topology.
Each executor runs one or more tasks of the
same
component(spout executor or
bolt).
Apache Storm 17

Architecture
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
ZooKeeper
Nimbus ZooKeeper
ZooKeeper
Hadoop v1 Storm
JobTracker Nimbus
(only1)
. distributescode around cluster
. assigns tasks to machines/supervisors
. failure monitoring
TaskTracker Supervisor . listens for work assigned to its machine
(many) . starts and stops worker processes as necessary b
o
a
n
s
e
N
d
i
mbus
ZooKeeper . coordination between Nimbus and the Supervisors
Apache Storm 18

Architecture
The Nimbus and Supervisor are
stateless .
All state is kept in Zookeeper .
1 ZK instance per machine
When the Nimbus or Supervisor fails, they'll start
back
up like nothing happened.
storm jar all-my-code.jar org.apache.storm.MyTopology
arg1 arg2
Apache Storm 19

Architecture
A running topology consists of many worker
processes
spread across many machines.
Apache Storm 20

Topology
Worker Process Worker Process
Task
Task
Task
Task
Task
Task
Task
Task
Task
Task
Task Task
Apache Storm 21

Topology With
Tasks in Details
Apache Storm 22

Shuffle grouping :
Randomized
round-robin
Fields grouping: all Tuples
with
the same field value(s) are
always
routed to the same task
Direct grouping: producer of
the
tuple decides which task of
the
consumer will receive the tuple
Apache Storm 23

A Sample Code of
Configuring
TopologyBuilder topologyBuilder = new TopologyBuilder();
Apache Storm 24

Fault Tolerance
Workers heartbeat back to Nimbus via
ZooKeeper .
Apache Storm 25

Fault Tolerance
When a worker dies , the supervisor will restart
it.
Apache Storm 26

Fault Tolerance
If it continuously fails on startup and is unable
to
heartbeat to Nimbus, Nimbus will reschedule the worker.
Apache Storm 27

Fault Tolerance
If a supervisor node dies , Nimbus will reassign the
work
to other nodes .
Apache Storm 28

Fault Tolerance
If Nimbus dies, topologies will continue to function
normally!
Apache Storm 29

but won’t be able to perform
reassignments.
Apache Storm 30

Fault Tolerance
In contrast to Hadoop, where if the JobTracker
dies, all
Apache Storm 31

the running jobs are
lost.
Apache Storm 32

Fault Tolerance
Preferably run ZK with nodes >= 3 so that you
can
Apache Storm 33

tolerate the failure of 1 ZK
server.
A Sample Word
Count Topology
Split Word
Sentence
Spout
Report
Sentence Count
Bolt Bolt Bolt
Sentence Spout: { "sentence": "my dog has
fleas" }
Split Sentence Bolt: { "word" : "my"
} { "word" : "dog" }
Apache Storm 34

{ "word" : "has" }
{ "word" : "fleas" }
Word Count Bolt: { "word" : "dog", "count" : 5
}
Report Bolt: prints the contents
Apache Storm 35

A Sample Word
Count Code
publicclassSentenceSpoue
txtendsBaseRichSpout{
privateSpoutOutputCollectocor llector;
privateString[ sentences =
{
"my dog has flea,s
"
"
i like cold beverage,s""the dog ate my
homework,""don't have a cow ma, "ni"don't thiniklike fleas“
}
;
privateintindex =
0;
publicvoiddeclareOutputField(O
sutputFieldsDeclaredr
eclarer) {
declarer.declar(n
eewFields("sentence)"
);
}
publicvoidopen(Mapconfig,TopologyContexc
tontextS
, poutOutputCollectocor llector)
this.collector=collector;
}
publicvoidnextTuple(){
this.collector.em(in
tewValues(sentences[index]));
index++;
if(index >=sentences.leng)th{ index =
0;
}
}
Apache Storm 36

A Sample Word
Count Code
publicclassSplitSentenceBoe
ltxtendsBaseRichBol{
t
privateOutputCollectoc
rollector;
publicvoidprepareM
( apconfig, TopologyContexc
tontextO
, utputCollector
collecto)r{
this.collector=collecto;r
}
publicvoidexecuteT
(upletuple){
Stringsentence =tuple.getStringByFie("
ldsentenc"e);
String[ words =sentence.spl(i"
t");
for(Stringword : word)s
this.collector.em(n
itewValues(word));
}
}
eclarer){
declarer.declar(n
eewFields("word"));
}
}
Apache Storm 37

A Sample Word
Count Code
publicclassWordCountBole
txtendsBaseRichBol{
t
privateOutputCollectoc
rollector;
privateHashMap<
String, Long>counts =
null;
publicvoidprepareM
tontextO
, utputCollectoc
rollecto)r{
this.collector=collector;
this.counts=newHashMap<
String, Long>();
}
publicvoidexecuteT
(upletuple){
Stringword =tuple.getStringByFie("
ldword");
Longcount =
this.countsg
. et(word);
if(count= null)
count=0L;
}
count++;
this.counts.pu(wt ord, coun)t;
this.collector.em(n
itewValues(word, coun)t);
}
eclarer){
declarer.declar(n
eewFields("word,
""coun"t));
}
}
Apache Storm 38

A Sample Word
Count Code
publicclassReportBoltextendsBaseRichBolt{
privateHashMap<
String, Long>counts =
null;
publicvoidprepareM
tontextO
, utputCollectoc
rollector){
this.counts=newHashMap<
String, Long>();
}
publicvoidexecuteT
(upletuple){
Stringword =tuple.getStringByFie("
ldword");
Longcount =tuple.getLongByFie(l"
d
coun"t);
this.counts.pu(wt ord, coun)t;
}
eclarer){
//this bolt does not emit any}
thing
publicvoidcleanup(){
List<
String>keys =
newArrayLis<
tString>();
keys.addA(lt
lhis.counts.keySe()t);
Collection.
s
sort(keys);
for(Stringkey : keys{)
System.out.println(key+" : "+this.countsg
. et(key));
}
}
}
Apache Storm 39

storm-170531123446.dotx.pptx

Recommended

Recommended

More Related Content

Similar to storm-170531123446.dotx.pptx

Similar to storm-170531123446.dotx.pptx (20)

Recently uploaded

Recently uploaded (20)

storm-170531123446.dotx.pptx