Samza memory capacity_2015_ieee_big_data_data_quality_workshop

A
Memory
Capacity
Model
for
High

Performing
Data-‐ﬁltering

Applica:ons
in
Samza
Framework

1

Tao
Feng,

Zhenyun
Zhuang,
Yi
Pan,
Haricharan
Ramachandra

LinkedIn
Corp

Agenda

•  Introduc:on

•  Memory
capacity
model

•  Evalua:on

•  Summary

2

What
Is
Samza

4

Input
Stream

Task
1
Task
2
Task
3

Output
Stream
Changelog
Stream

Local
state

store

Checkpoint

Container

Samza-‐based
Data
Filtering
Systems

•  Two
main
scenarios

5

Data
Filtering
By
Rules
Data
Filtering
By
Joining
Streams

MEMORY
CAPACITY
MODEL

6

Mo:va:on

•  We
need
an
accurate
resource
predic:ve

model
for
beSer
capacity
planning

•  We
could
have
more
containers
within
single

node

•  Higher
density
without
SLA
viola:on

•  Lower
business
cost

7

Memory
Capacity
Model

•  L
=
TPE(B
+
Bk
+
Bm)

•  L:
live
data
set
size

•  T:
Number
of
input
topics

•  P:
Number
of
par::on
per
topic

•  E:
Number
of
unique
entry
per
par::on

•  B:
bytes
per
treemap
entry

•  Bk:
bytes
of
key
serializa:on

•  Bm:
bytes
of
value
message
serializa:on

•  Required
Heap
Size
1H
=
2*L

•  Details
of
proof
could
be
found
in
our
paper

8

Test
Setup

10

0

broker

Ka^a
Clusters

1
…
N

Contaier

Test
System

•  Test
System
conﬁg

•  24
cores

•  1gbps
nic

•  45GB
mem

•  JVM
op:on:

•  UseG1GC

•  G1HeapRegion
Size=
4M

broker

broker

Evalua:on
Methodology

•  Firstly
we
deduct
the
heap
size
based
on
the

model
as
1H

•  e.g
with
T:
1,
P:
8,
E:
5
million,
B:
40
bytes,
Bk:
24

bytes,
Bm:
24
bytes,
1H
=
2*L
=
2*TPE(B
+
Bk
+

Bm)
=
7G

•  Secondly
we
compare
Samza
job
throughput,

system
performance
metrics(GC
:me,

CPU:me)
with
2H,
3H
cases

11

Performance
Results

12

Performance
Results(conc)

13

Performance
Results(conc)

14

1H
2H
3H

Young
GC
of
G1
Count
88
29
32

Total
:me(ms)
9850
5063
6144

Mixed
GC
of
G1
Count
24
0
0

Total
:me(ms)
70166
0
0

Total
Count
112
29
31

Total
:me(ms)
80117
5063
6144

•  No
full
GC
involved
in
1H
case

•  Expected
Higher
CPU
:me
and
GC
:me
for
1H
case

Summary

•  The
model
predicts
memory
usage
of
Samza

accurately
and
guarantees
Samza
job
SLA
w/o

much
Samza
SLA
viola:on

•  It
allows
2X
dense
Samza
containers

deployments
within
the
same
node
with
the

accurate
memory
es:ma:on

15

Samza memory capacity_2015_ieee_big_data_data_quality_workshop

More Related Content

What's hot

Similar to Samza memory capacity_2015_ieee_big_data_data_quality_workshop

More from Tao Feng

Recently uploaded

Samza memory capacity_2015_ieee_big_data_data_quality_workshop