Data modeling, cluster sizing, and planning can be difficult when transitioning an existing product to Cassandra. Especially when the new Cassandra deployment needs to handle millions of operations per second on day one! In this talk I'll discuss our strategy for data modeling, cluster sizing, and our novel approach to data replication across data centers.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Cassandra Day Denver 2014: Transitioning to Cassandra for an Already Giant Product
1. Who We Are
• Holistic video advertising
platform for publishers
• Most transparent global
marketplace for sellers
• Founded in 2007, 180+
employees globally
• First to market with video
RTB in 2010
• Integrated with over half of
comScore top 100 pubs
2+ Billion
Ad decisions per day
Reaching
335+ Million
Uniques every month
Serving impressions in
100+ Countries
Integrated with 100,000+ publishers Connected to 35+ DSPs
Partnerships with Industry-Leading Trading Desks 10,000+ Brand Name Advertisers
2. How
Big
is
Our
Data?
● Over
2
billion
ad
auc1ons
per
day
● Each
auc1on
generates
an
average
of
20-‐30
“records”
● Audience
data
● Bid
data
● Event
tracking
● A
“record
everything”
approach
would
result
in
approximately
50
billion
records
per
day
● Normalized:
~
1.5
TB
/
day
uncompressed
● Denormalized:
~
5
TB
/
day
uncompressed
● Possibly
up
to
150
TB
of
data
per
month
● We
are
not
currently
using
a
“record
everything”
approach,
but
we
want
to
get
there
3. How
Fast
Does
Our
Data
Grow?
2.5E+09
2E+09
1.5E+09
1E+09
500000000
0
Auctions
10/9/13 11/9/13 12/9/13 1/9/14 2/9/14 3/9/14 4/9/14 5/9/14 6/9/14 7/9/14 8/9/14 9/9/14
Auctions
Growth Curve
● Typically
our
numbers
double
every
6
months
● We
expect
more
rapid
growth
over
the
next
year
or
two
4. How
Fast
Does
Our
Data
Grow?
1.2E+10
1E+10
8E+09
6E+09
4E+09
2E+09
0
Auctions
Auctions
Growth Curve
● Typically
our
numbers
double
every
6
months
● We
expect
more
rapid
growth
over
the
next
year
or
two
5. How
Big
Might
Our
Data
Get
in
a
Year?
● Over
10
billion
ad
auc1ons
per
day
● Each
auc1on
generates
an
average
of
30-‐40
“records”
● Audience
data
● Bid
data
● Event
tracking
● A
“record
everything”
approach
would
result
in
approximately
350
billion
records
per
day
● Normalized:
~
10.5
TB
/
day
uncompressed
● Denormalized:
~
35
TB
/
day
uncompressed
● Possibly
up
to
1
PB
of
data
per
month
9. Audience
Data
● Informa1on
about
the
people
that
are
viewing
ads
● Segment
data
(demographics,
browsing
history,
etc)
● Ads
viewed
● ID
syncing
● Used
for
adver1sers
to
reach
their
target
audience
● “My
product
is
relevant
only
to
bald,
le_-‐handed,
highly
educated
immigrants
from
Uzbekistan.”
● Historically
stored
in
cookies
● Technology
advancement
necessitates
abandoning
the
cookie
strategy
● Track
users
on
mul1ple
devices
● Mobile
devices
and
connected
TVs
don’t
typically
support
cookies
● Offline
availability
of
data
provides
analy1cs
opportuni1es
● Discover
trends
● Look-‐alike
segments
10. Cookie-‐based
Workflow
Browser SpotXchange
Browser requests an ad via HTTP
Server responds with an ad
The ad payload includes data partner URLs
Data Partner
Browser requests partner URL
Request payload includes partner’s cookies
Data provider replies with a redirect containing
segment information Browser redirects to us
We respond with our own cookies
containing their segment data
Browser requests an ad via HTTP
Now including our cookies
Server responds with an ad targeted at
audience segments
11. Moving
Away
from
Cookies
● Cookies
are
overly
constraining
and
gefng
worse
● Limited
to
desktop
traffic
● Payload
is
expensive
● Bandwidth
● Processing
(encryp1on
and
encoding)
● Impossible
to
run
deep
analy1cs
● Impossible
to
perform
server-‐to-‐server
synchroniza1on
● Newer
iden1fica1on
standards
are
emerging
● Apple
IDFA,
Android
ID,
UIDH
● Facebook/Google
ID
● Device
Fingerprin1ng
● Moving
audience
data
onto
the
server
allows
data
to
be
associated
with
any
iden1fier
and
even
tying
mul1ple
iden1fiers
together
12. Server-‐side
Storage
Workflow
Browser SpotXchange
Browser requests an ad via HTTP
Server responds with an ad
The ad payload includes data partner URLs
Data Partner
Browser requests partner URL with SpotX
audience ID attached
Data provider replies with a redirect containing
segment information and partner audience ID Browser redirects to us
Browser requests an ad via HTTP
Server responds with an ad targeted at
audience segments
We store segment
information on the server
13. Addi1onal
Capabili1es
Data Partner Browser SpotXchange
User visits a site that provides the partner new
Provider data about that user
recognizes
that they
have synced
this user with
us in the past
Browser requests an ad via HTTP
Server responds with an ad targeted at the
new audience segments
Partner calls us server-to-server with the user
information, including our ID and new data
We store the new
information
16. Data
Modeling
● Solu1on
must
minimize
latency
● Ajempt
to
constrain
to
one
read
or
one
write
per
event
whenever
possible
{!
"audience_id" : "12345678-1234-1234-1234-123456789012",!
"segments" : {"123": 1, "456": 3, "789": 1},!
"foreign_ids" : {!
"7180" : "967992447104804725",!
"7347" : "bWv2-HOyJD8y6D",!
"6960" : "404_53e3bfa26d377"!
},!
"pacing" : {!
"2235" : 1412892591!
}!
}!
17. Data
Modeling
● Ad
auc1oning
requires
reading
nearly
all
the
data
at
once
● Most
events
write
to
one
and
only
one
data
type
(segments,
ids,
etc)
{!
"audience_id" : "12345678-1234-1234-1234-123456789012",!
"segments" : {"123": 1, "456": 3, "789": 1},!
"foreign_ids" : {!
"7180" : "967992447104804725",!
"7347" : "bWv2-HOyJD8y6D",!
"6960" : "404_53e3bfa26d377"!
},!
"pacing" : {!
"2235" : 1412892591!
}!
}!
18. Data
Modeling
● Store
an
en1re
user
record
in
one
row
so
it
can
be
read
all
at
once
● All
data
can
be
represented
as
a
tuple
with
a
unique
iden1fier
CREATE TABLE audience_data (!
!audience_id uuid,!
!type int,!
!key text,!
!value text,!
!PRIMARY KEY (audience_id, type, key)!
!);!
!
SELECT * FROM audience_data WHERE!
!audience_id = 12345678-1234-1234-1234-123456789012;!
!
SELECT * FROM audience_data WHERE!
!audience_id = 12345678-1234-1234-1234-123456789012 AND!
!type = 1;!
!
INSERT INTO audience_data (audience_id, type, key, value) VALUES!
!(12345678-1234-1234-1234-123456789012, 1, '123', '1');!
20. Cluster
Sizing
● Distributed
a
modified
version
of
our
implementa1on
to
produc1on
● Replaced
Cassandra
calls
with
writes
to
a
log
file
● Created
a
spreadsheet
detailing
each
opera1on
and
how
much
load
to
expect
during
peak
1mes
● Used
peak
load
to
size
the
cluster
for
each
data
center
● Used
formula
provided
by
Aaron
Morton
at
The
Last
Pickle
system_constant * #cores * #nodes = ops / sec!
replication_factor .!
! !
ops = 1 read or write to one row (cluster in a partition)!
!
system_constant = !3000 for AWS!
! ! !4000 for spinning disk!
! ! !7-12K for SSD!
21. Our
Backwards
Scenario
● Typically
clusters
start
small
and
grow
as
product
adop1on
grows
● Our
cluster
will
be
working
hardest
when
we
first
turn
it
on
● Exis1ng
cookie
data
needs
to
migrate
to
Cassandra
● As
data
migrates
the
load
will
decrease,
normalize,
and
then
increase
slowly
over
the
next
few
months
● Don’t
expect
to
match
original
load
for
nearly
a
year
140000
120000
100000
80000
60000
40000
20000
0
Peak OPS
Peak OPS
22. Cluster
Sizing
den01
iad02
lon01
hkg01
%
of
total
traffic
40%
40%
13%
7%
Normal
tag
rate
0.1
0.1
0.1
0.1
Migra1on
tag
rate
0.75
0.75
0.75
0.75
SELECT
DC
Avg
46,296
46,296
15,046
8,102
Peak
138,889
138,889
45,139
24,306
FE
Avg
126
263
684
675
Peak
377
789
2,052
2,025
UPDATE
tag
(typical
load)
DC
Avg
4,630
4,630
1,505
810
Peak
13,889
13,889
4,514
2,431
FE
Avg
13
26
68
68
Peak
38
79
205
203
UPDATE
tag
(migra1on)
DC
Avg
30,093
30,093
9,780
5,266
Peak
90,278
90,278
29,340
15,799
FE
Avg
82
171
445
439
Peak
245
513
1,334
1,317
Total
DC
ops
(normal
load)
Avg
51,389
51,389
16,701
8,993
Peak
154,167
154,167
50,104
26,979
Total
DC
ops
(migra1on)
Avg
87,963
87,963
28,588
15,394
Peak
263,889
263,889
85,764
46,181
Constant
Nodes
required
(8
core)
Spinning
disk
4000
17
17
6
3
SSD
7000
10
10
4
3
23. Cluster
Sizing
den01
iad02
lon01
hkg01
%
total
traffic
40.00%
40.00%
13.00%
7.00%
Tag
Daily
GB
0.9
0.9
0.3
0.2
Total
GB
84
84
27
15
Frqcap
Daily
GB
0.6
0.6
0.2
0.1
Total
GB
3.9
3.9
1.3
0.7
Partner
Daily
GB
8
8
3
1
Total
GB
1509
1509
490
264
Total
GB
3193
3193
1038
559
Per
Node
GB
456
456
346
186
25. Replica1on
Strategy
● Typical Cassandra replication is expensive
● Each
write
is
replicated
to
all
data
centers
● Each
cluster
must
be
approximately
the
same
size
● Need
a
large
pipe
between
data
centers
● 3.7
million
columns
updated
per
second
at
peak
load
● Amount
of
replica1on
needed
increases
with
each
new
data
center
26. Replica1on
Strategy
● Alternate
strategies
suggested:
● Offline
copying
of
SSTables
● Maintain
a
log
of
changed
records
and
run
a
process
to
copy
those
periodically
● We
realized
that
this
data
doesn’t
need
to
be
available
in
all
places
at
all
1mes
● People
don’t
o_en
move
far
enough
to
switch
data
centers
● Data
integrity
is
of
fairly
low
importance
● If
our
data
isn’t
replicated
the
user
will
appear
to
be
new
when
they
switch
data
centers,
but
that
only
has
a
minor
short-‐term
impact
on
applica1on
performance
● Other
replica1on
strategies
we
considered:
● None
● Just-‐in-‐1me
● Queued
27. Replica1on
Strategy:
None
● Don’t
replicate
at
all
● Each
data
center
has
its
own
completely
self-‐contained
cluster
● Advantage:
Simplicity
● Disadvantage:
Limits
our
ability
to
target
users
when
they
move
or
we
reassign
regions
to
a
different
data
center
28. Replica1on
Strategy:
Just-‐In-‐Time
● Each
data
center
has
its
own
completely
self-‐contained
cluster
● The
user’s
iden1fier
cookie
contains
a
data
center
iden1fier
● When
an
incoming
request’s
cookie
says
it’s
from
a
different
data
center,
read
from
that
data
center
in
real
1me
and
replicate
on
the
fly
to
the
local
data
center
● Reassign
the
cookie
using
the
new
data
center
● Advantage
● Audience
data
is
(almost)
always
available
(99.99%)
● Disadvantages
● Addi1onal
latency
while
wai1ng
for
user
data
● In
cookie-‐less
situa1ons
we’d
need
to
query
all
data
centers
if
the
local
data
center
has
no
data
30. Replica1on
Strategy:
Queued
● Each
data
center
has
its
own
completely
self-‐contained
cluster
● When
a
fetch
ajempt
misses,
the
user
ID
is
added
to
a
queue
for
reconcilia1on
● Treat
the
user
as
a
new
user
and
store
their
data
locally
● Background
process
consumes
IDs
from
the
queue
and
ajempts
to
fetch
data
from
other
data
centers
for
reconcilia1on
● Advantages
● Audience
data
is
mostly
available
(98%)
● Minimal
addi1onal
latency
introduced
● Disadvantages
● Addi1onal
opera1onal
complexity
● Occasional
data
misses