Cassandra at Lithium

Cassandra at Lithium
Paul Cichonski, Senior Software Engineer
@paulcichonski

Lithium?
• Helping companies build
social communities for
their customers
• Founded in 2001
• ~300 customers
• ~84 million users
• ~5 million unique logins
in past 20 days

2

Use Case: Notification Service
1. Stores subscriptions
2. Processes community events
3. Generates notifications when events
match against subscriptions
4. Builds user activity feed out of
notifications

3

Notification Service  System View

4

The Cluster (v1.2.6)
• 4 nodes, each node:
– Centos 6.4
– 8 cores, 2TB for commit-log, 3x 512GB SSD
for data

• Average writes/s: 100-150, peak: 2000
• Average reads/s: 100, peak: 1500
• Use Astyanax on client-side

5

Data Model: Subscriptions
Fulfillment

identifies target of subscription
identifies entity that is subscribed

7

standard_subscription_index row
stored as:
user:2:creationtimesta user:53:creationtimest user:88:creationtimest
66edfdb7-6ff7amp
amp
mp
458c-94a8421627c1b6f5:me
1390939665
1390939670
1390939660
ssage:13

maps to (cqlsh):

8

Data Model: Subscription Display
(time series)

9

subscriptions_for_entity_by_time row
stored as:
1390939670:label:testl
66edfdb7-6ff71390939665:board:53
abel
458c-94a8421627c1b6f5:use
r:2:0

1390939660:message:
13

maps to (cqlsh):

10

Data Model: Subscription Display
(content browsing)

11

subscriptions_for_entity_by_type row
stored as:
message:13:creationti
66edfdb7-6ff7mestamp
458c-94a8421627c1b6f5:use
1390939660
r:2

board:53:creationtime label:testlabel:creation
stamp
timestamp
1390939665

1390939670

maps to (cqlsh):

12

Data Model: Activity Feed (fan-out
writes)

JSON blob representing activity

13

activity_for_entity row
stored as:
66edfdb7-6ff7458c-94a8421627c1b6f5:use
r:2:0

31aac580-8550-11e3-ad74000c29351b9d:moderationA
ction:event_summary

f4efd590-82ca-11e3-ad74000c29351b9d:badge:event_
summary

1571b680-7254-11e3-8d70000c29351b9d:kudos:event_
summary

{moderation_json}

{badge_json}

{kudos_json}

maps to (cqlsh):

14

Migration Strategy
(mysql  cassandra)

15

Data Migration: Trust, but Verify
Fully repeatable due to
idempotent writes

1) Bulk Migrate all subscription data (HTTP)

lia

NS

2) Consistency check all subscription data (HTTP)

Also runs after
migration to verify
shadow-writes

16

Verify: Consistency Checking

17

Subscription Write Strategy
Reads for subscription
fulfillment happen in ns.

user
subscription_write

NS system boundary

subscription_write
(shadow_write)
lia

activemq

Notification
Service

mysql

Reads for UI fulfilled by
legacy mysql (temporary)

Cassandr
a

18

Path to Production: QA Issue #1
(many writes to same row kill cluster)

19

Problem: CQL INSERTS
Single Thread SLOW, even with BATCH
(multiple second latency for writing chunks
of 1000 subscriptions)
Largest customer (~20 million subscriptions)
would have taken weeks to migrate

20

Just Use More Threads? Not Quite

21

Mutations Could Not Keep Up

23

Solution: Work Closer to Storage
Layer
Work here:
user:2:creationtimesta user:53:creationtimest user:88:creationtimest
66edfdb7-6ff7amp
amp
mp
458c-94a8421627c1b6f5:me
1390939665
1390939670
1390939660
ssage:13

Not here:

24

Solution: Thrift batch_mutate

More details: http://thelastpickle.com/blog/2013/09/13/CQL3-to-Astyanax-Compatibility.html
Allowed us to write 200,000 subscriptions to 3 CFs in ~45 seconds with almost no
impact on cluster.
NOTE: supposedly fixed in 2.0: CASSANDRA-4693
25

Path to Production: QA Issue #2
(read timeouts)

26

Tombstone Buildup and Timeouts

CF holding notification settings rewritten every 30 minutes
Eventually tombstone build-up caused
reads to time out

27

Production Issue #1
(dead cluster)

29

Hard Drive Failure on All Nodes
4 days after release, we started seeing this in /var/log/cassandra/system.log

After following a bunch of dead ends, we also found this in /var/messages.log

This cascaded to all nodes and within an hour, cluster was dead

30

TRIM Support to the Rescue

* http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives

31

Production Issue #2
(repair causing tornadoes of destruction)

32

Activity Feed Data Explosion
• Activity data written with a TTL of 30 days.
• Users in 99th percentile were receiving
multiple thousands of writes per day.
• compacted row maximum size: ~85mb
(after 30 days)

Here, be Dragons:
– CASSANDRA-5799: Column can expire while lazy compacting
it...
33

Problem Did Not Surface for 30
Days
• Repairs started taking up to a week
• Created 1000’s of SSTables
• High latency:

34

Solution: Trim Feeds Manually

35

activity_for_entity cfstats

36

How we monitor in Prod
• Nodetool, Opscenter and JMX to monitor
cluster
• Yammer Metrics at every layer of
Notification Service, use graphite to
visualize
• Use Netflix Hystrix in Notification Service
to guard against cluster failure

37

Lessons Learned
• Have a migration strategy that allows both
systems to stay live until you have proven
Cassandra in prod
• Longevity tests are key, especially if you
will have tombstones
• Understand how gc_grace_seconds and
compaction affect tombstone cleanup
• Test with production data loads if you can
38

Cassandra at Lithium

More Related Content

Similar to Cassandra at Lithium

Recently uploaded

Cassandra at Lithium

Editor's Notes