StormWars - when the data stream shrinks

Apache
Storm
• A
Stream
Processing
framework

Apache
Storm
• A
Stream
Processing
framework
• Used
to
pull
data
from
a
stream
and

perform
real
time
analytics
on
the

data

a
Stream…
• Can
be
Apache
Kafka
,
Amazon

Kinesis.

a
Stream…
• Can
be
Apache
Kafka
,
Amazon

Kinesis.
• Normally
has
partitions
/
shards
for

better
read
&
write
throughput

Partition
Metadata
• Storm
uses
INTEGERS (0,1…)
to
identify

partitions.

Partition
Metadata
• Storm
uses
INTEGERS (0,1…)
to
identify

partitions.
• Where
as
……

Partition
Metadata
• Storm
uses
INTEGERS (0,1…)
to
identify

partitions.
• Where
as
……
• Amazon
Kinesis
uses
STRINGS to
identify

partitions

So
how
can
we
process
data
?

So
how
can
we
process
data
?
• User
sorts
the
STRINGS
(shard
Id’s)

So
how
can
we
process
data
?
• User
sorts
the
STRINGS
(shard
Id’s)
• User
maps
the
sorted
items
id’s
from
0...N

So
how
can
we
process
data
?
• User
sorts
the
STRINGS(shard
Id’s)
• User
maps
the
sorted
items
id’s
from
0...N
Shard-‐id-‐0001

<-‐>

0
Shard-‐id-‐0002

<-‐>

1
…..
…..

Shard
Split
in

Amazon
Kinesis

Stream
shrinks

(3
to
2
shards)

Disturbance
in
the
Force
• Storm
partition
metadata
NO longer
valid
as

the
shard
has
been
deleted.

Disturbance
in
the
Force
• Storm
partition
metadata
NO longer
valid
as

the
shard
has
been
deleted.
• Storm
partition
metadata
should
now
be:
shard-‐2

<-‐>

0
shard-‐3

<-‐>

1

a
Solution:
• WHITE_LIST
of
shards
for
a
storm
topology.

a
Solution:
• WHITE_LIST
of
shards
for
a
storm
topology.
• A
storm
topology
pulls
from
a
specific
set
of

shards.

a
Solution:
• WHITE_LIST
of
shards
for
a
storm
topology.
• A
storm
topology
pulls
from
a
specific
set
of

shards.
• So
in
our
case:
– start
topology-‐1 with
WHITELIST
=“shard-‐1”

a
Solution:
• WHITE_LIST
of
shards
for
a
storm
topology.
• A
storm
topology
pulls
from
a
specific
set
of

shards.
• So
in
our
case:
– start
topology-‐1 with
WHITELIST
=“shard-‐1”
– split
shard

a
Solution:
• WHITE_LIST
of
shards
for
a
storm
topology.
• A
storm
topology
pulls
from
a
specific
set
of

shards.
• So
in
our
case:
– start
topology-‐1 with
WHITELIST
=“shard-‐1”
– split
shard
– start
topology-‐2 with
WHITELIST=“shard-‐2
&
3”

a
Solution…
• When
shard-‐1

gets
deleted
,
topology
1
dies
with
it.

a
Solution…
• When
shard-‐1

gets
deleted
,
topology
1
dies
with
it.
• Topology
2
continues
processing
data
for
the
new

shards.

a
Solution…
So,
there
is
NO
metadata
conflict
,
as
there
are
2
different
topologies

pulling
data
from
different
sets
of
shards.

Thank
you
&
May
the
force
be
with
you
!
jaihind213@gmail.com
sweetweet213@twitter
mash213.wordpress.com
linkedin.com/in/213vishnu

StormWars - when the data stream shrinks

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (8)

Similar to StormWars - when the data stream shrinks

Similar to StormWars - when the data stream shrinks (20)

More from vishnu rao

More from vishnu rao (10)

Recently uploaded

Recently uploaded (20)

StormWars - when the data stream shrinks