PRACTICE MAKES PERFECT:
EXTREME CASSANDRA OPTIMIZATION
@AlTobey
Tech Lead, Compute and Data Services
#CASSANDRA
Thursday, ...
2
⁍ About me / Ooyala
⁍ How not to manage your Cassandra clusters
⁍ Make it suck less
⁍ How to be a heuristician
⁍ Tools o...
3
⁍ Tech Lead, Compute and Data Services at Ooyala, Inc.
⁍ C&D team is #devops: 3 ops, 3 eng, me
⁍ C&D team is #bdaas: Big...
4
⁍ Founded in 2007
⁍ 230+ employees globally
⁍ 200M unique users,110+ countries
⁍ Over 1 billion videos played per month
...
5
Ooyala has been using Cassandra since v0.4
Use cases:
⁍ Analytics data (real-time and batch)
⁍ Highly available K/V stor...
Ooyala: Legacy Platform
cassandracassandracassandracassandra
6
S3
hadoophadoophadoophadoophadoop
cassandra
ABE Service
API...
memTable
Avoiding read-modify-write
7#CASSANDRA
Albert 6 Wednesday 0
Evan Tuesday 0 Wednesday 0
Frank Tuesday 3 Wednesday ...
memTable
Avoiding read-modify-write
8#CASSANDRA
Al Tuesday 2 Wednesday 0
Phillip Tuesday 0 Wednesday 1
cassandra13_drinks ...
memTable
Avoiding read-modify-write
9#CASSANDRA
Albert Tuesday 22 Wednesday 0
cassandra13_drinks column family
ssTable
Alb...
Avoiding read-modify-write
10#CASSANDRA
cassandra13_drinks column family
ssTable
Albert Tuesday 22 Wednesday 0
Evan Tuesda...
2011: 0.6 ➜ 0.8
11
⁍ Migration is still a largely unsolved problem
⁍ Wrote a tool in Scala to scrub data and write via Thr...
Changes: 0.6 ➜ 0.8
12
⁍ Cassandra 0.8
⁍ 24GiB heap
⁍ Sun Java 1.6 update
⁍ Linux 2.6.36
⁍ XFS on MD RAID5
⁍ Disabled swap ...
13
⁍ 18 nodes ➜ 36 nodes
⁍ DSE 3.0
⁍ Stale tombstones again!
⁍ No downtime!
cassandra
GlusterFS P2P
DSE 3.0
Thrift
#CASSAN...
System Changes: Apache 1.0 ➜ DSE 3.0
14
⁍ DSE 3.0 installed via apt packages
⁍ Unchanged: heap, distro
⁍ Ran much faster t...
Config Changes: Apache 1.0 ➜ DSE 3.0
15
⁍ Schema: compaction_strategy = LCS
⁍ Schema: bloom_filter_fp_chance = 0.1
⁍ Schem...
16
⁍ 36 nodes ➜ lots more nodes
⁍ As usual, no downtime!
#CASSANDRA
DSE 3.1DSE 3.1
replication
2013: Datacenter Move
Thurs...
17
Upcoming use cases:
⁍ Store every event from our players at full resolution
⁍ Cache code for our Spark job server
⁍ AMP...
18
spark
APIloggersplayers kafka
ingest
job server
#CASSANDRA
DSE 3.1
Next Generation Architecture: Ooyala Event Store
Tac...
19
⁍ Security
⁍ Cost of Goods Sold
⁍ Operations / support
⁍ Developer happiness
⁍ Physical capacity (cpu/memory/network/di...
20
⁍ I’d love to be more scientific, but production comes first
⁍ Sometimes you have to make educated guesses
⁍ It’s not a...
21
Observe, Orient, Decide, Act:
⁍ Observe the system in production under load
⁍ Make small, safe changes
⁍ Observe
⁍ Comm...
Testing Shiny Things
22
⁍ Like kernels
⁍ And Linux distributions
⁍ And ZFS
⁍ And btrfs
⁍ And JVM’s & parameters
⁍ Test the...
ext4
ext4
ext4
ZFS
ext4
kernel
upgrade
ext4
btrfs
Testing Shiny Things: In Production
23#CASSANDRA
Thursday, August 8, 13
24#CASSANDRA
Brendan Gregg’s Tool Chart
http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-a...
25#CASSANDRA
dstat -lrvn 10
Thursday, August 8, 13
26#CASSANDRA
cl-netstat.pl
https://github.com/tobert/perl-ssh-tools
Thursday, August 8, 13
27#CASSANDRA
iostat -x 1
Thursday, August 8, 13
28#CASSANDRA
htop
Thursday, August 8, 13
29#CASSANDRA
jconsole
Thursday, August 8, 13
30#CASSANDRA
opscenter
Thursday, August 8, 13
31#CASSANDRA
nodetool ring
10.10.10.10 Analytics rack1 Up Normal 47.73 MB 1.72% 1012046694721756637024691720378965
10.10.1...
32#CASSANDRA
nodetool cfstats
Keyspace: gostress
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms....
33#CASSANDRA
nodetool proxyhistograms
Offset Read Latency Write Latency Range Latency
35 0 20 0
42 0 61 0
50 0 82 0
60 0 4...
34#CASSANDRA
nodetool compactionstats
al@node ~ $ nodetool compactionstats
pending tasks: 3
compaction type keyspace colum...
35#CASSANDRA
⁍ cassandra-stress
⁍ YCSB
⁍ Production
⁍ Terasort (DSE)
⁍ Homegrown
Stress Testing Tools
Thursday, August 8, ...
36#CASSANDRA
kernel.pid_max = 999999
fs.file-max = 1048576
vm.max_map_count = 1048576
net.core.rmem_max = 16777216
net.cor...
37#CASSANDRA
ra=$((2**14))# 16k
ss=$(blockdev --getss /dev/sda)
blockdev --setra $(($ra / $ss)) /dev/sda
echo 256 > /sys/b...
38#CASSANDRA
-Xmx8G leave it alone
-Xms8G leave it alone
-Xmn1200M 100MiB * nCPU
-Xss180k should be fine
-XX:+UseNUMA
numa...
cgroups
39#CASSANDRA
Provides fine-grained control over Linux resources
⁍ Makes the Linux scheduler better
⁍ Lets you mana...
cgroups
40#CASSANDRA
cat >> /etc/default/cassandra <<EOF
cpucg=/sys/fs/cgroup/cpu/cassandra
mkdir $cpucg
cat $cpucg/../cpu...
Successful Experiment: btrfs
41#CASSANDRA
mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1
mkfs.btrfs -m raid10 -d raid0 /dev/s...
Successful Experiment: ZFS on Linux
42#CASSANDRA
zpool create data raidz /dev/sd[c-h]
zfs create data/cassandra
zfs set co...
Conclusions
43#CASSANDRA
⁍ Tuning is multi-dimensional
⁍ Production load is your most important benchmark
⁍ Lean on Cassan...
Questions?
44#CASSANDRA
⁍ Twitter: @AlTobey
⁍ Github: https://github.com/tobert
⁍ Email: al@ooyala.com / tobert@gmail.com
...
Upcoming SlideShare
Loading in...5
×

Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

1,887

Published on

Ooyala has been using Apache Cassandra since version 0.4.Their data ingest volume has exploded since 0.4 and Cassandra has scaled along with it. In this webinar, Al will share lessons that he has learned across an array of topics from an operational perspective including how to manage, tune, and scale Cassandra in a production environment.

Speaker: Al Tobey, Tech Lead, Compute and Data Services at Ooyala

Al Tobey is Tech Lead of the Compute and Data services team at Ooyala. His team develops and operates Ooyala's internal big data platform, consisting of Apache Cassandra, Hadoop, and internally developed tools. When not in front of a computer, Al is a father, husband, and trombonist.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,887
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
47
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optimization

  1. 1. PRACTICE MAKES PERFECT: EXTREME CASSANDRA OPTIMIZATION @AlTobey Tech Lead, Compute and Data Services #CASSANDRA Thursday, August 8, 13
  2. 2. 2 ⁍ About me / Ooyala ⁍ How not to manage your Cassandra clusters ⁍ Make it suck less ⁍ How to be a heuristician ⁍ Tools of the trade ⁍ More Settings ⁍ Show & Tell #CASSANDRA Outline Thursday, August 8, 13
  3. 3. 3 ⁍ Tech Lead, Compute and Data Services at Ooyala, Inc. ⁍ C&D team is #devops: 3 ops, 3 eng, me ⁍ C&D team is #bdaas: Big Data as a Service ⁍ ~100 Cassandra nodes, expanding quickly ⁍ Obligatory: we’re hiring #CASSANDRA @AlTobey Thursday, August 8, 13
  4. 4. 4 ⁍ Founded in 2007 ⁍ 230+ employees globally ⁍ 200M unique users,110+ countries ⁍ Over 1 billion videos played per month ⁍ Over 2 billion analytic events per day #CASSANDRA Ooyala Thursday, August 8, 13
  5. 5. 5 Ooyala has been using Cassandra since v0.4 Use cases: ⁍ Analytics data (real-time and batch) ⁍ Highly available K/V store ⁍ Time series data ⁍ Play head tracking (cross-device resume) ⁍ Machine Learning Data #CASSANDRA Ooyala & Cassandra Thursday, August 8, 13
  6. 6. Ooyala: Legacy Platform cassandracassandracassandracassandra 6 S3 hadoophadoophadoophadoophadoop cassandra ABE Service APIloggersplayers START HERE #CASSANDRA read-modify-write Thursday, August 8, 13
  7. 7. memTable Avoiding read-modify-write 7#CASSANDRA Albert 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 cassandra13_drinks column family Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 Tuesday Thursday, August 8, 13
  8. 8. memTable Avoiding read-modify-write 8#CASSANDRA Al Tuesday 2 Wednesday 0 Phillip Tuesday 0 Wednesday 1 cassandra13_drinks column family ssTable Albert 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 Tuesday Thursday, August 8, 13
  9. 9. memTable Avoiding read-modify-write 9#CASSANDRA Albert Tuesday 22 Wednesday 0 cassandra13_drinks column family ssTable Albert Tuesday 2 Wednesday 0 Phillip Tuesday 0 Wednesday 1 ssTable Albert 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 Tuesday Thursday, August 8, 13
  10. 10. Avoiding read-modify-write 10#CASSANDRA cassandra13_drinks column family ssTable Albert Tuesday 22 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 0 Wednesday 1 Thursday, August 8, 13
  11. 11. 2011: 0.6 ➜ 0.8 11 ⁍ Migration is still a largely unsolved problem ⁍ Wrote a tool in Scala to scrub data and write via Thrift ⁍ Rebuilt indexes - faster than copying hadoop cassandra GlusterFS P2P cassandra Thrift #CASSANDRA Scala Map/Reduce Thursday, August 8, 13
  12. 12. Changes: 0.6 ➜ 0.8 12 ⁍ Cassandra 0.8 ⁍ 24GiB heap ⁍ Sun Java 1.6 update ⁍ Linux 2.6.36 ⁍ XFS on MD RAID5 ⁍ Disabled swap or at least vm.swappiness=1 #CASSANDRA Thursday, August 8, 13
  13. 13. 13 ⁍ 18 nodes ➜ 36 nodes ⁍ DSE 3.0 ⁍ Stale tombstones again! ⁍ No downtime! cassandra GlusterFS P2P DSE 3.0 Thrift #CASSANDRA Scala Map/Reduce 2012: Capacity Increase Thursday, August 8, 13
  14. 14. System Changes: Apache 1.0 ➜ DSE 3.0 14 ⁍ DSE 3.0 installed via apt packages ⁍ Unchanged: heap, distro ⁍ Ran much faster this time! ⁍ Mistake: Moved to MD RAID 0 Fix: RAID10 or RAID5, MD, ZFS, or btrfs ⁍ Mistake: Running on Ubuntu Lucid Fix: Ubuntu Precise #CASSANDRA Thursday, August 8, 13
  15. 15. Config Changes: Apache 1.0 ➜ DSE 3.0 15 ⁍ Schema: compaction_strategy = LCS ⁍ Schema: bloom_filter_fp_chance = 0.1 ⁍ Schema: sstable_size_in_mb = 256 ⁍ Schema: compression_options = Snappy ⁍ YAML: compaction_throughput_mb_per_sec: 0 #CASSANDRA Thursday, August 8, 13
  16. 16. 16 ⁍ 36 nodes ➜ lots more nodes ⁍ As usual, no downtime! #CASSANDRA DSE 3.1DSE 3.1 replication 2013: Datacenter Move Thursday, August 8, 13
  17. 17. 17 Upcoming use cases: ⁍ Store every event from our players at full resolution ⁍ Cache code for our Spark job server ⁍ AMPLab Tachyon backend? #CASSANDRA Coming Soon for Cassandra at Ooyala Thursday, August 8, 13
  18. 18. 18 spark APIloggersplayers kafka ingest job server #CASSANDRA DSE 3.1 Next Generation Architecture: Ooyala Event Store Tachyon? Thursday, August 8, 13
  19. 19. 19 ⁍ Security ⁍ Cost of Goods Sold ⁍ Operations / support ⁍ Developer happiness ⁍ Physical capacity (cpu/memory/network/disk) ⁍ Reliability / Resilience ⁍ Compromise #CASSANDRA There’s more to tuning than performance: Thursday, August 8, 13
  20. 20. 20 ⁍ I’d love to be more scientific, but production comes first ⁍ Sometimes you have to make educated guesses ⁍ It’s not as difficult as it’s made out to be ⁍ Your brain is great at heuristics. Trust it. ⁍ Concentrate on bottlenecks ⁍ Make incremental changes ⁍ Read Malcom Gladwell’s “Blink” #CASSANDRA I am not a scientist ... heuristician? Thursday, August 8, 13
  21. 21. 21 Observe, Orient, Decide, Act: ⁍ Observe the system in production under load ⁍ Make small, safe changes ⁍ Observe ⁍ Commit or Revert #CASSANDRA The OODA Loop Thursday, August 8, 13
  22. 22. Testing Shiny Things 22 ⁍ Like kernels ⁍ And Linux distributions ⁍ And ZFS ⁍ And btrfs ⁍ And JVM’s & parameters ⁍ Test them in production! #CASSANDRA Thursday, August 8, 13
  23. 23. ext4 ext4 ext4 ZFS ext4 kernel upgrade ext4 btrfs Testing Shiny Things: In Production 23#CASSANDRA Thursday, August 8, 13
  24. 24. 24#CASSANDRA Brendan Gregg’s Tool Chart http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x Thursday, August 8, 13
  25. 25. 25#CASSANDRA dstat -lrvn 10 Thursday, August 8, 13
  26. 26. 26#CASSANDRA cl-netstat.pl https://github.com/tobert/perl-ssh-tools Thursday, August 8, 13
  27. 27. 27#CASSANDRA iostat -x 1 Thursday, August 8, 13
  28. 28. 28#CASSANDRA htop Thursday, August 8, 13
  29. 29. 29#CASSANDRA jconsole Thursday, August 8, 13
  30. 30. 30#CASSANDRA opscenter Thursday, August 8, 13
  31. 31. 31#CASSANDRA nodetool ring 10.10.10.10 Analytics rack1 Up Normal 47.73 MB 1.72% 1012046694721756637024691720378965 10.10.10.10 Analytics rack1 Up Normal 63.94 MB 0.86% 1026714038123521225967078556906197 10.10.10.10 Analytics rack1 Up Normal 85.73 MB 0.86% 1041381381525285814909465393433428 10.10.10.10 Analytics rack1 Up Normal 47.87 MB 0.86% 1056048724927050403851852229960659 10.10.10.10 Analytics rack1 Up Normal 39.73 MB 0.86% 1070716068328814992794239066487891 10.10.10.10 Analytics rack1 Up Normal 40.74 MB 1.75% 1100423945662575060114582859200003 10.10.10.10 Analytics rack1 Up Normal 40.08 MB 2.20% 1137814208669076757916163680305794 10.10.10.10 Analytics rack1 Up Normal 56.19 MB 3.45% 1196501513956187970179620530735245 10.10.10.10 Analytics rack1 Up Normal 214.88 MB 11.62% 1394248867770897155613247921498720 10.10.10.10 Analytics rack1 Up Normal 214.29 MB 2.45% 1435882108713996181107000284314407 10.10.10.10 Analytics rack1 Up Normal 158.49 MB 1.76% 1465773686249280216901752503449044 10.10.10.10 Analytics rack1 Up Normal 40.3 MB 0.92% 1481401683578223483181070489250370 Thursday, August 8, 13
  32. 32. 32#CASSANDRA nodetool cfstats Keyspace: gostress Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: stressful SSTable count: 1 Space used (live): 32981239 Space used (total): 32981239 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 0 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 336 Compacted row minimum size: 7007507 Compacted row maximum size: 8409007 Compacted row mean size: 8409007 Could be using a lot of heap Controllable by sstable_size_in_mb Thursday, August 8, 13
  33. 33. 33#CASSANDRA nodetool proxyhistograms Offset Read Latency Write Latency Range Latency 35 0 20 0 42 0 61 0 50 0 82 0 60 0 440 0 72 0 3416 0 86 0 17910 0 103 0 48675 0 124 1 97423 0 149 0 153109 0 179 2 186205 0 215 5 139022 0 258 134 44058 0 310 2656 60660 0 372 34698 742684 0 446 469515 7359351 0 535 3920391 31030588 0 642 9852708 33070248 0 770 4487796 9719615 0 924 651959 984889 0 Thursday, August 8, 13
  34. 34. 34#CASSANDRA nodetool compactionstats al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 9819749801 16922291634 58.03% Compaction hastur counter_archive 12141850720 16147440484 75.19% Compaction hastur mark_archive 647389841 1475432590 43.88% Active compaction remaining time : n/a al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family bytes compacted bytes total progress Compaction hastur gauge_archive 10239806890 16922291634 60.51% Compaction hastur counter_archive 12544404397 16147440484 77.69% Compaction hastur mark_archive 1107897093 1475432590 75.09% Active compaction remaining time : n/a Thursday, August 8, 13
  35. 35. 35#CASSANDRA ⁍ cassandra-stress ⁍ YCSB ⁍ Production ⁍ Terasort (DSE) ⁍ Homegrown Stress Testing Tools Thursday, August 8, 13
  36. 36. 36#CASSANDRA kernel.pid_max = 999999 fs.file-max = 1048576 vm.max_map_count = 1048576 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 65536 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 vm.dirty_ratio = 10 vm.dirty_background_ratio = 2 vm.swappiness = 1 /etc/sysctl.conf Thursday, August 8, 13
  37. 37. 37#CASSANDRA ra=$((2**14))# 16k ss=$(blockdev --getss /dev/sda) blockdev --setra $(($ra / $ss)) /dev/sda echo 256 > /sys/block/sda/queue/nr_requests echo cfq > /sys/block/sda/queue/scheduler echo 16384 > /sys/block/md7/md/stripe_cache_size /etc/rc.local Thursday, August 8, 13
  38. 38. 38#CASSANDRA -Xmx8G leave it alone -Xms8G leave it alone -Xmn1200M 100MiB * nCPU -Xss180k should be fine -XX:+UseNUMA numactl --interleave JVM Args Thursday, August 8, 13
  39. 39. cgroups 39#CASSANDRA Provides fine-grained control over Linux resources ⁍ Makes the Linux scheduler better ⁍ Lets you manage systems under extreme load ⁍ Useful on all Linux machines ⁍ Can choose between determinism and flexibility Thursday, August 8, 13
  40. 40. cgroups 40#CASSANDRA cat >> /etc/default/cassandra <<EOF cpucg=/sys/fs/cgroup/cpu/cassandra mkdir $cpucg cat $cpucg/../cpuset.mems >$cpucg/cpuset.mems cat $cpucg/../cpuset.cpus >$cpucg/cpuset.cpus echo 100 > $cpucg/shares echo $$ > $cpucg/tasks EOF Thursday, August 8, 13
  41. 41. Successful Experiment: btrfs 41#CASSANDRA mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mount -o compress=lzo /dev/sdc1 /data Thursday, August 8, 13
  42. 42. Successful Experiment: ZFS on Linux 42#CASSANDRA zpool create data raidz /dev/sd[c-h] zfs create data/cassandra zfs set compression=lzjb data/cassandra zfs set atime=off data/cassandra zfs set logbias=throughput data/cassandra Thursday, August 8, 13
  43. 43. Conclusions 43#CASSANDRA ⁍ Tuning is multi-dimensional ⁍ Production load is your most important benchmark ⁍ Lean on Cassandra, experiment! ⁍ No one metric tells the whole story Thursday, August 8, 13
  44. 44. Questions? 44#CASSANDRA ⁍ Twitter: @AlTobey ⁍ Github: https://github.com/tobert ⁍ Email: al@ooyala.com / tobert@gmail.com Thursday, August 8, 13
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×