Cassandra for FamilySearch Family Tree Performance Tuning

© 2017 by Intellectual Reserve, Inc. All rights reserved. 1
Performance at Scale
Cassandra for FamilySearch Family Tree
John Sumsion

2
Cassandra Database
Metaphor:
• Set of Pipes
• Client => Disk
• Low Friction
• High Throughput
• Redundant

3
Cassandra Database
Presentation Scope:
• Read-Heavy Cluster
• Tuning Strategy
• Smooth Flow
• Peak Performance

4
Family Tree
• Free access at https://familysearch.org
• Supported by growing record collection
• World-wide user base
• Backed by Apache Cassandra (DSE)

5
Family Tree
• 63T database
• One RF=3 datacenter
• Another offsite RF=2 datacenter
• About 5700M DB reads / peak day
• About 70M DB writes / peak day

6
Family Tree
• Multiple views of person
• Full change history
• Flexible schema
• 4th major iteration over 10 years
• constant schema change

11
Cassandra Database
Metaphor:
• Set of Pipes
• Client => Disk (or RAM)
• Low Friction
• High Throughput
• Redundant

12
Cassandra Database
Topics:
• Tuning Strategy
• Smooth Flow

13
Cassandra Database
Ideal Read Circumstances:
• Lots of RAM
• Lots of CPU capacity
• Lots of Disk bandwidth
• Lots of Network bandwidth
• Enough nodes to reduce impact of GC outliers

14
Cassandra Database
• Small records
• All records are same size
• Balanced data distribution
• Balanced access patterns
• No hotspots

15
Cassandra Database
1. No Friction
2. Full Throughput
vs

16
Cassandra Database
Realistic Read Circumstances:
• Ideal is RARELY going to happen!
• Cassandra can stay up under abuse
• But throughput suffers

17
Cassandra Database
Realistic Read Circumstances:
1. Minimize Friction
2. Maximize Throughput
vs

18
Friction in Cassandra
Sources of Friction:
• CPU spikes to 100%
• Disk Saturation spikes to 100%
• GC Pauses spike above 200ms
• Total GC Time goes over 1-2% over 5min
• Network Saturation spikes to 100% (rare)

19
Friction in Cassandra
Needed Visibility
• Gathered metrics (JMX, GC, dstat)
• Composed CPU/Disk/GC in a dashboard
• Example Dashboard

24
Turbulence in Cassandra
• Friction somewhere causes requests to queue
• Queued requests cause upstream delays
• Affected node tries to shed load to avoid dying
• Clients / Other nodes become affected

25
Situation: Compactions not throttled enough
Symptoms:
• Periods of heavy CPU utilization (plateau)
• Periods of full disk saturation (plateau)
• Periods of more-frequent GC
• Periods of higher request latency (p99+ plateau)

26
Situation: Compactions not throttled enough
Solutions:
• Throttle compaction dynamically using
nodetool setcompactionthroughput
• Keep compaction backlog under 10-30min
• Bake the setting into cassandra.yaml

27
Situation: Too frequent Memtable flushing
Symptoms:
• Very frequent compaction on tables with most writes
• "Forcing flush" in debug.log
• Constant compactions, Constant disk saturation
• Using Opscenter Repair Service
• High number of cells read per query

28
Situation: Too frequent Memtable flushing
Solutions:
• Turn off Opscenter Repair for short TTL tables
• Turn off Opscenter Repair for other tables that don't
need full consistency
• Google "dse excluding tables ignore_tables"
• Switch small tables from STCS to LCS

29
Situation: Not enough JVM Heap
Symptoms:
• Overly frequent GC, occasional OOM
• Lower query throughput
• No obvious bottleneck
• More CPU spent on GC than necessary

30
Situation: Not enough JVM Heap
Solutions:
• Increase JVM heap, but not more than 32G
• Ratchet up until occasional OOM stops
• Don't go too high, 32G max
• Stop if max GC pause increases

31
Situation: Too large JVM Heap
Symptoms:
• Much longer GC cycles once in a while
• Old Gen able to build up too much cruft
• Large variation in response time (p99+ spikes)
• Other nodes experience request queueing

32
Situation: Too large JVM Heap
Solutions:
• Reduce heap size
• But don't cause OOM
• Ratchet down while max GC pause times drop
• Remember extra RAM means extra buffer cache

33
Situation: GC not tuned for low-latency
Symptoms:
• Longer pause times
• GC gets behind and has to do long Full GCs
• Other nodes experience request queueing

34
Solutions:
• CMS wizard? Do that
• Easier? Use G1 with 40-50% new space
• Turn on GC logging, Plot GC over time

35
Solutions:
• -XX:G1RSetUpdatingPauseTimePercent=5
• -XX:InitiatingHeapOccupancyPercent=60
• -XX:+ParallelRefProcEnabled
• -XX:G1ReservePercent=20
• -XX:ParallelGCThreads=13 (on r4.4xlarge 16 CPU box)
• -XX:ConcGCThreads=13

36
Situation: Disk spikes even when compaction throttled
Symptoms:
• No CPU spikes or plateaus
• No Disk activity during compaction
• But short periods of 100% disk saturation right after
• Also GC spike right after compaction complete
• Large response time variation around compaction

37
Situation: Disk spikes even when compaction throttled
Solutions:
• Use sysctl.conf to spread out writes during
compaction
• See Amy's tuning guide
• https://tobert.github.io/pages/als-cassandra-21-
tuning-guide.html

38
Situation: Disk readahead too large
Symptoms:
• Lower throughput than you expect
• No obvious bottleneck
• More bytes read from disk than network send

39
Situation: Disk readahead too large
Solutions:
• blkdev --setra 128 (for 64k chunks)
• See Amy's tuning guide
• https://tobert.github.io/pages/als-
cassandra-21-tuning-guide.html

40
Situation: Timeouts set too long
Symptoms:
• Much longer GC cycles once in a while
• Large GC delays on good nodes when one goes bad
• One bad node cascades to more

41
Situation: Timeouts set too long
Solutions:
• Reduce read timeout until it hurts (nodetool)
• Reduce write timeout until it hurts (nodetool)
• Leave general request timeout higher to avoid cqlsh
timeout
• Bake timeouts into cassandra.yaml

42
Situation: Not enough free memory
Symptoms:
• More disk activity than working set size
• High query latency even for hot records
• More bytes read from disk than active set size

43
Situation: Not enough free memory
Solutions:
• Shrink heap if possible
• Maybe shrink row/key/chunk caches
• This makes more room for OS buffer cache
• Stop unnecessary processes

44
Situation: Disproportionately large records
Symptoms:
• Queries for certain keys always take longer
• Three nodes spike IO/CPU at the same time
• Slow query logging in C* 3.x

45
Situation: Disproportionately large records
Solutions:
• Alert or monitor slow query logs to find
problems (C* 3.x)
• Median:p99 of 1:100 is ok, but 1:10000 is bad
• Optimize size of the largest records

46
Situation: Disproportionate edit rate
Symptoms:
• Queries for certain keys take longer
• Large number of cells per read in tablestats
• Three nodes spike IO/CPU at the same time
• Slow query logging in C* 3.x

47
Situation: Disproportionate edit rate
Solutions:
• Postpone large edits until user is done
• Minimize number of redundant bytes rewritten

48
Situation: Too few nodes
Symptoms:
• Majority of nodes hitting same bottleneck
• Hour-long periods of poor p99+ response time
• One bad node cascades to more

49
Situation: Too few nodes
Solutions:
• Try all of the above first (the easy ones)
• Add nodes at a practical point
• Tweak & tune more, maybe you can shrink

50
Situation: ALL OF THE ABOVE
Symptoms:
• Cluster IO overwhelmed
• Bad p99+ response times
• Multiple sick nodes after large user edits

51
Before

52
After

53
Cassandra Database
Metaphor:
• Set of Pipes
• Client => Disk (or RAM)
• Low Friction
• High Throughput
• Redundant

54
Cassandra Database
Presentation Scope:
• Tuning Strategy
• Smooth Flow

55
Cassandra Database
1. Minimal Friction
2. Maximal Throughput
vs

56
Cassandra Database
Reprise:
• Clean the Pipes
• Smooth the Flow

57
Wrap-up
• Q&A
• Thanks for great conference!

Cassandra for FamilySearch Family Tree Performance Tuning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cassandra for FamilySearch Family Tree Performance Tuning

Similar to Cassandra for FamilySearch Family Tree Performance Tuning (20)

Recently uploaded

Recently uploaded (20)

Cassandra for FamilySearch Family Tree Performance Tuning