What we've learned in developing the Family Tree data model -- focusing on what it took to overcome the problems encountered as traffic increased over 5x.
4. 4
Family Tree
• Free access at https://familysearch.org
• Supported by growing record collection
• World-wide user base
• Backed by Apache Cassandra (DSE)
5. 5
Family Tree
• 63T database
• One RF=3 datacenter
• Another offsite RF=2 datacenter
• About 5700M DB reads / peak day
• About 70M DB writes / peak day
6. 6
Family Tree
• Multiple views of person
• Full change history
• Flexible schema
• 4th major iteration over 10 years
• constant schema change
13. 13
Cassandra Database
Ideal Read Circumstances:
• Lots of RAM
• Lots of CPU capacity
• Lots of Disk bandwidth
• Lots of Network bandwidth
• Enough nodes to reduce impact of GC outliers
14. 14
Cassandra Database
Ideal Read Circumstances:
• Small records
• All records are same size
• Balanced data distribution
• Balanced access patterns
• No hotspots
16. 16
Cassandra Database
Realistic Read Circumstances:
• Ideal is RARELY going to happen!
• Cassandra can stay up under abuse
• But throughput suffers
18. 18
Friction in Cassandra
Sources of Friction:
• CPU spikes to 100%
• Disk Saturation spikes to 100%
• GC Pauses spike above 200ms
• Total GC Time goes over 1-2% over 5min
• Network Saturation spikes to 100% (rare)
19. 19
Friction in Cassandra
Needed Visibility
• Gathered metrics (JMX, GC, dstat)
• Composed CPU/Disk/GC in a dashboard
• Example Dashboard
24. 24
Turbulence in Cassandra
• Friction somewhere causes requests to queue
• Queued requests cause upstream delays
• Affected node tries to shed load to avoid dying
• Clients / Other nodes become affected
25. 25
Turbulence in Cassandra
Situation: Compactions not throttled enough
Symptoms:
• Periods of heavy CPU utilization (plateau)
• Periods of full disk saturation (plateau)
• Periods of more-frequent GC
• Periods of higher request latency (p99+ plateau)
26. 26
Turbulence in Cassandra
Situation: Compactions not throttled enough
Solutions:
• Throttle compaction dynamically using
nodetool setcompactionthroughput
• Keep compaction backlog under 10-30min
• Bake the setting into cassandra.yaml
27. 27
Turbulence in Cassandra
Situation: Too frequent Memtable flushing
Symptoms:
• Very frequent compaction on tables with most writes
• "Forcing flush" in debug.log
• Constant compactions, Constant disk saturation
• Using Opscenter Repair Service
• High number of cells read per query
28. 28
Turbulence in Cassandra
Situation: Too frequent Memtable flushing
Solutions:
• Turn off Opscenter Repair for short TTL tables
• Turn off Opscenter Repair for other tables that don't
need full consistency
• Google "dse excluding tables ignore_tables"
• Switch small tables from STCS to LCS
29. 29
Turbulence in Cassandra
Situation: Not enough JVM Heap
Symptoms:
• Overly frequent GC, occasional OOM
• Lower query throughput
• No obvious bottleneck
• More CPU spent on GC than necessary
30. 30
Turbulence in Cassandra
Situation: Not enough JVM Heap
Solutions:
• Increase JVM heap, but not more than 32G
• Ratchet up until occasional OOM stops
• Don't go too high, 32G max
• Stop if max GC pause increases
31. 31
Turbulence in Cassandra
Situation: Too large JVM Heap
Symptoms:
• Much longer GC cycles once in a while
• Old Gen able to build up too much cruft
• Large variation in response time (p99+ spikes)
• Other nodes experience request queueing
32. 32
Turbulence in Cassandra
Situation: Too large JVM Heap
Solutions:
• Reduce heap size
• But don't cause OOM
• Ratchet down while max GC pause times drop
• Remember extra RAM means extra buffer cache
33. 33
Turbulence in Cassandra
Situation: GC not tuned for low-latency
Symptoms:
• Longer pause times
• Large variation in response time (p99+ spikes)
• GC gets behind and has to do long Full GCs
• Other nodes experience request queueing
34. 34
Turbulence in Cassandra
Situation: GC not tuned for low-latency
Solutions:
• CMS wizard? Do that
• Easier? Use G1 with 40-50% new space
• Turn on GC logging, Plot GC over time
35. 35
Turbulence in Cassandra
Situation: GC not tuned for low-latency
Solutions:
• -XX:G1RSetUpdatingPauseTimePercent=5
• -XX:InitiatingHeapOccupancyPercent=60
• -XX:+ParallelRefProcEnabled
• -XX:G1ReservePercent=20
• -XX:ParallelGCThreads=13 (on r4.4xlarge 16 CPU box)
• -XX:ConcGCThreads=13
36. 36
Turbulence in Cassandra
Situation: Disk spikes even when compaction throttled
Symptoms:
• No CPU spikes or plateaus
• No Disk activity during compaction
• But short periods of 100% disk saturation right after
• Also GC spike right after compaction complete
• Large response time variation around compaction
37. 37
Turbulence in Cassandra
Situation: Disk spikes even when compaction throttled
Solutions:
• Use sysctl.conf to spread out writes during
compaction
• See Amy's tuning guide
• https://tobert.github.io/pages/als-cassandra-21-
tuning-guide.html
38. 38
Turbulence in Cassandra
Situation: Disk readahead too large
Symptoms:
• Lower throughput than you expect
• No obvious bottleneck
• More bytes read from disk than network send
39. 39
Turbulence in Cassandra
Situation: Disk readahead too large
Solutions:
• blkdev --setra 128 (for 64k chunks)
• See Amy's tuning guide
• https://tobert.github.io/pages/als-
cassandra-21-tuning-guide.html
40. 40
Turbulence in Cassandra
Situation: Timeouts set too long
Symptoms:
• Much longer GC cycles once in a while
• Large variation in response time (p99+ spikes)
• Large GC delays on good nodes when one goes bad
• One bad node cascades to more
41. 41
Turbulence in Cassandra
Situation: Timeouts set too long
Solutions:
• Reduce read timeout until it hurts (nodetool)
• Reduce write timeout until it hurts (nodetool)
• Leave general request timeout higher to avoid cqlsh
timeout
• Bake timeouts into cassandra.yaml
42. 42
Turbulence in Cassandra
Situation: Not enough free memory
Symptoms:
• More disk activity than working set size
• High query latency even for hot records
• More bytes read from disk than active set size
43. 43
Turbulence in Cassandra
Situation: Not enough free memory
Solutions:
• Shrink heap if possible
• Maybe shrink row/key/chunk caches
• This makes more room for OS buffer cache
• Stop unnecessary processes
44. 44
Turbulence in Cassandra
Situation: Disproportionately large records
Symptoms:
• Queries for certain keys always take longer
• Three nodes spike IO/CPU at the same time
• Slow query logging in C* 3.x
45. 45
Turbulence in Cassandra
Situation: Disproportionately large records
Solutions:
• Alert or monitor slow query logs to find
problems (C* 3.x)
• Median:p99 of 1:100 is ok, but 1:10000 is bad
• Optimize size of the largest records
46. 46
Turbulence in Cassandra
Situation: Disproportionate edit rate
Symptoms:
• Queries for certain keys take longer
• Large number of cells per read in tablestats
• Three nodes spike IO/CPU at the same time
• Slow query logging in C* 3.x
47. 47
Turbulence in Cassandra
Situation: Disproportionate edit rate
Solutions:
• Postpone large edits until user is done
• Minimize number of redundant bytes rewritten
48. 48
Turbulence in Cassandra
Situation: Too few nodes
Symptoms:
• Majority of nodes hitting same bottleneck
• Hour-long periods of poor p99+ response time
• One bad node cascades to more
49. 49
Turbulence in Cassandra
Situation: Too few nodes
Solutions:
• Try all of the above first (the easy ones)
• Add nodes at a practical point
• Tweak & tune more, maybe you can shrink
50. 50
Turbulence in Cassandra
Situation: ALL OF THE ABOVE
Symptoms:
• Cluster IO overwhelmed
• Bad p99+ response times
• Multiple sick nodes after large user edits