• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization
 

C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization

on

  • 2,330 views

Speaker: Al Tobey, Open Source Mechanic at DataStax ...

Speaker: Al Tobey, Open Source Mechanic at DataStax
Video: http://www.youtube.com/watch?v=AcPME94F13U&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=24
Ooyala has been using Apache Cassandra since version 0.4. Our data ingest volume has exploded since 0.4 and Cassandra has scaled along with us. Al will cover many topics from an operational perspective on how to manage, tune, and scale Cassandra in a production environment.

Statistics

Views

Total Views
2,330
Views on SlideShare
2,327
Embed Views
3

Actions

Likes
1
Downloads
19
Comments
0

2 Embeds 3

http://23.253.69.203 2
http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization C* Summit EU 2013: Practice Makes Perfect: Extreme Cassandra Optimization Presentation Transcript

    • PRACTICE MAKES PERFECT: EXTREME CASSANDRA OPTIMIZATION @AlTobey Open Source Mechanic Datastax #CASSANDRAEU
    • Outline ⁍ About me ⁍ How not to manage your Cassandra clusters ⁍ Make it better ⁍ How to be a heuristician ⁍ Tools of the trade ⁍ More Settings ⁍ Show & Tell #CASSANDRAEU !2
    • Previously: @AlTobey / Ooyala ⁍ Tech Lead, Compute and Data Services at Ooyala, Inc. ⁍ C&D team is #devops: 3 ops, 3 eng, me ⁍ C&D team is #bdaas: Big Data as a Service ⁍ ~200 Cassandra nodes, expanding quickly #CASSANDRAEU !3
    • Ooyala ⁍ Founded in 2007 ⁍ 230+ employees globally ⁍ 200M unique users,110+ countries ⁍ Over 1 billion videos played per month ⁍ Over 2 billion analytic events per day #CASSANDRAEU !4
    • Ooyala & Cassandra Ooyala has been using Cassandra since v0.4 Use cases: ⁍ Analytics data (real-time and batch) ⁍ Highly available K/V store ⁍ Time series data ⁍ Play head tracking (cross-device resume) ⁍ Machine Learning Data #CASSANDRAEU !5
    • Ooyala: Legacy Platform player S3 loggers API START HERE hadoop hadoop hadoop hadoop hadoop read-modify-write cassandra #CASSANDRAEU cassandra cassandra cassandra ABE Service cassandra !6
    • Avoiding read-modify-write cassandra13_drinks column family Albert Tuesday 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 memTable #CASSANDRAEU !7
    • Avoiding read-modify-write cassandra13_drinks column family Al Tuesday 2 Wednesday 0 Phillip Tuesday 0 Wednesday 1 Albert Tuesday 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 memTable ssTable #CASSANDRAEU !8
    • Avoiding read-modify-write cassandra13_drinks column family memTable Albert Tuesday 22 Wednesday 0 Albert Tuesday 2 Wednesday 0 Phillip Tuesday 0 Wednesday 1 Albert Tuesday 6 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 12 Wednesday 0 ssTable ssTable #CASSANDRAEU !9
    • Avoiding read-modify-write cassandra13_drinks column family Albert Tuesday 22 Wednesday 0 Evan Tuesday 0 Wednesday 0 Frank Tuesday 3 Wednesday 3 Kelvin Tuesday 0 Wednesday 0 Krzysztof Tuesday 0 Wednesday 0 Phillip Tuesday 0 Wednesday 1 ssTable #CASSANDRAEU !10
    • 2011: 0.6 ➜ 0.8 cassandra cassandra GlusterFS P2P hadoop Thrift Scala Map/Reduce ⁍ Migration is still a largely unsolved problem ⁍ Wrote a tool in Scala to scrub data and write via Thrift ⁍ Rebuilt indexes - faster than copying #CASSANDRAEU !11
    • Changes: 0.6 ➜ 0.8 ⁍ Cassandra 0.8 ⁍ 24GiB heap ⁍ Sun Java 1.6 update ⁍ Linux 2.6.36 ⁍ XFS on MD RAID5 ⁍ Disabled swap or at least vm.swappiness=1 #CASSANDRAEU !12
    • 2012: Capacity Increase ⁍ 18 nodes ➜ 36 nodes ⁍ DSE 3.0 ⁍ Stale tombstones again! ⁍ No downtime! Thrift cassandra DSE 3.0 GlusterFS P2P Scala Map/Reduce #CASSANDRAEU !13
    • System Changes: Apache 1.0 ➜ DSE 3.0 ⁍ DSE 3.0 installed via apt packages ⁍ Unchanged: heap, distro ⁍ Ran much faster this time! ⁍ Mistake: Moved to MD RAID 0 Fix: RAID10 or RAID5, MD, ZFS ⁍ Mistake: Running on Ubuntu Lucid Fix: Ubuntu Precise #CASSANDRAEU !14
    • Config Changes: Apache 1.0 ➜ DSE 3.0 ⁍ Schema: compaction_strategy = LCS ⁍ Schema: bloom_filter_fp_chance = 0.1 ⁍ Schema: sstable_size_in_mb = 256 ⁍ Schema: compression_options = Snappy ⁍ YAML: compaction_throughput_mb_per_sec: 0 #CASSANDRAEU !15
    • 2013: Datacenter Move ⁍ 36 nodes ➜ lots more nodes ⁍ As usual, no downtime! DSE 3.1 DSE 3.1 replication #CASSANDRAEU !16
    • Coming Soon for Cassandra at Ooyala Upcoming use cases: ⁍ Store every event from the players at full resolution ⁍ Cache code for the Spark job server ⁍ AMPLab Tachyon backend? #CASSANDRAEU !17
    • Next Generation Architecture: Ooyala Event Store player loggers kafka API job server ingest spark Tachyon? DSE 3.1 #CASSANDRAEU !18
    • There’s more to tuning than performance: ⁍ Security ⁍ Cost of Goods Sold ⁍ Operations / support ⁍ Developer happiness ⁍ Physical capacity (cpu/memory/network/disk) ⁍ Reliability / Resilience ⁍ Compromise #CASSANDRAEU !19
    • I am not a scientist ... heuristician? ⁍ I’d love to be more scientific, but good science takes time ⁍ Sometimes you have to make educated guesses ⁍ It’s not as difficult as it’s made out to be ⁍ Your brain is great at heuristics. Trust it. ⁍ Concentrate on bottlenecks ⁍ Make incremental changes ⁍ Read Malcom Gladwell’s “Blink” #CASSANDRAEU !20
    • Testing Shiny Things ⁍ Like kernels ⁍ And Linux distributions ⁍ And ZFS ⁍ And btrfs ⁍ And JVM’s & parameters ⁍ Test them in production (if you must) #CASSANDRAEU !21
    • Testing Shiny Things: In Production ext4 btrfs ext4 ext4 ext4 kernel upgrade ZFS ext4 #CASSANDRAEU !22
    • Brendan Gregg’s Tool Chart http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x #CASSANDRAEU !23
    • dstat -lrvn 10 #CASSANDRAEU !24
    • cl-netstat.pl https://github.com/tobert/perl-ssh-tools #CASSANDRAEU !25
    • iostat -x 1 #CASSANDRAEU !26
    • htop #CASSANDRAEU !27
    • jconsole #CASSANDRAEU !28
    • opscenter #CASSANDRAEU !29
    • nodetool ring 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics 10.10.10.10 Analytics #CASSANDRAEU rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 rack1 Up Normal 47.73 MB Up Normal 63.94 MB Up Normal 85.73 MB Up Normal 47.87 MB Up Normal 39.73 MB Up Normal 40.74 MB Up Normal 40.08 MB Up Normal 56.19 MB Up Normal 214.88 MB Up Normal 214.29 MB Up Normal 158.49 MB Up Normal 40.3 MB 1.72% 0.86% 0.86% 0.86% 0.86% 1.75% 2.20% 3.45% 11.62% 2.45% 1.76% 0.92% 101204669472175663702469172037896580098 102671403812352122596707855690619718940 104138138152528581490946539343342857782 105604872492705040385185222996065996624 107071606832881499279423906648789135466 110042394566257506011458285920000334950 113781420866907675791616368030579466301 119650151395618797017962053073524524487 139424886777089715561324792149872061049 143588210871399618110700028431440799305 146577368624928021690175250344904436129 148140168357822348318107048925037023042 !30
    • nodetool cfstats Keyspace: gostress Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Column Family: stressful SSTable count: 1 Space used (live): 32981239 Space used (total): 32981239 Number of Keys (estimate): 128 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 0 Read Count: 0 Read Latency: NaN ms. Write Count: 0 Write Latency: NaN ms. Pending Tasks: 0 Bloom Filter False Positives: 0 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 336 Compacted row minimum size: 7007507 Compacted row maximum size: 8409007 Compacted row mean size: 8409007 #CASSANDRAEU Controllable by sstable_size_in_mb Could be using a lot of heap !31
    • nodetool proxyhistograms Offset 35 42 50 60 72 86 103 124 149 179 215 258 310 372 446 535 642 770 924 Read Latency 0 0 0 0 0 0 0 1 0 2 5 134 2656 34698 469515 3920391 9852708 4487796 651959 #CASSANDRAEU Write Latency Range Latency 20 0 61 0 82 0 440 0 3416 0 17910 0 48675 0 97423 0 153109 0 186205 0 139022 0 44058 0 60660 0 742684 0 7359351 0 31030588 0 33070248 0 9719615 0 984889 0 !32
    • nodetool compactionstats al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family Compaction hastur gauge_archive Compaction hastur counter_archive Compaction hastur mark_archive Active compaction remaining time : n/a al@node ~ $ nodetool compactionstats pending tasks: 3 compaction type keyspace column family Compaction hastur gauge_archive Compaction hastur counter_archive Compaction hastur mark_archive Active compaction remaining time : n/a #CASSANDRAEU bytes compacted bytes total progress 9819749801 16922291634 58.03% 12141850720 16147440484 75.19% 647389841 1475432590 43.88% bytes compacted bytes total progress 10239806890 16922291634 60.51% 12544404397 16147440484 77.69% 1107897093 1475432590 75.09% !33
    • Stress Testing Tools ⁍ cassandra-stress ⁍ YCSB ⁍ Production ⁍ Terasort (DSE) ⁍ Homegrown #CASSANDRAEU !34
    • /etc/sysctl.conf kernel.pid_max = 999999 fs.file-max = 1048576 vm.max_map_count = 1048576 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 65536 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 vm.swappiness = 1 vm.dirty_ratio = 10 vm.dirty_background_ratio = 5 #CASSANDRAEU !35
    • /etc/rc.local ra=$((2**14))# 16k ss=$(blockdev --getss /dev/sda) blockdev --setra $(($ra / $ss)) /dev/sda ! echo 256 > /sys/block/sda/queue/nr_requests #echo cfq > /sys/block/sda/queue/scheduler #echo deadline > /sys/block/sda/queue/scheduler #echo noop > /sys/block/sda/queue/scheduler ! echo 16384 > /sys/block/md7/md/stripe_cache_size echo cfq > /sys/block/sda/queue/scheduler # !36
    • JVM Args -Xmx8G leave it alone -Xms8G leave it alone -Xmn1200M 100MiB * nCPU -Xss180k should be fine ! -XX:+UseNUMA (test it) numactl --interleave (safe option) #CASSANDRAEU !37
    • cgroups Provides fine-grained control over Linux resources ⁍ Makes the Linux scheduler behave ⁍ Lets you manage systems under extreme load ⁍ Useful on all Linux machines ⁍ Can choose between determinism and flexibility #CASSANDRAEU !38
    • cgroups cat >> /etc/default/cassandra <<EOF cpucg=/sys/fs/cgroup/cpu/cassandra mkdir $cpucg cat $cpucg/../cpuset.mems >$cpucg/cpuset.mems cat $cpucg/../cpuset.cpus >$cpucg/cpuset.cpus echo 100 > $cpucg/shares echo $$ > $cpucg/tasks EOF #CASSANDRAEU !39
    • Successful Experiment: btrfs mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mkfs.btrfs -m raid10 -d raid0 /dev/sd[c-h]1 mount -o compress=lzo /dev/sdc1 /data #CASSANDRAEU !40
    • Successful Experiment: ZFS on Linux zpool create data raidz /dev/sd[c-h] zfs create data/cassandra zfs set compression=lzjb data/cassandra zfs set atime=off data/cassandra zfs set primarycache=metadata data/cassandra #CASSANDRAEU !41
    • Conclusions ⁍ Tuning is multi-dimensional ⁍ Production load is your most important benchmark ⁍ Lean on Cassandra, experiment! ⁍ No one metric tells the whole story #CASSANDRAEU !42
    • Questions? ⁍ Twitter: @AlTobey ⁍ Email: atobey@datastax.com #CASSANDRAEU !43