Cassandra Community Webinar | In Case of Emergency Break Glass
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Cassandra Community Webinar | In Case of Emergency Break Glass

  • 1,054 views
Uploaded on

The design of Apache Cassandra allows applications to provide constant uptime. Peer-to-Peer technology ensures there are no single points of failure, and the Consistency guarantees allow applications ...

The design of Apache Cassandra allows applications to provide constant uptime. Peer-to-Peer technology ensures there are no single points of failure, and the Consistency guarantees allow applications to function correctly while some nodes are down. There is also a wealth of information provided by the JMX API and the system log. All of this means that when things go wrong you have the time, information and platform to resolve them without downtime. This presentation will cover some of the common, and not so common, performance issues, failures and management tasks observed in running clusters. Aaron will discuss how to gather information and how to act on it. Operators, Developers and Managers will all benefit from this exposition of Cassandra in the wild.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,054
On Slideshare
1,054
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
21
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. CASSANDRA COMMUNITY WEBINARS AUGUST 2013 IN CASE OF EMERGENCY, BREAK GLASS Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  • 2. AboutThe Last Pickle Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP, Hector Maintainer, 6+ years combined Cassandra experience. Based in New Zealand & Austin,TX.
  • 3. Platform Tools Problems Maintenance www.thelastpickle.com
  • 4. The Platform www.thelastpickle.com
  • 5. The Platform & Clients www.thelastpickle.com
  • 6. The Platform & Running Clients www.thelastpickle.com
  • 7. The Platform & Reality Consistency Availability Partition Tolerance www.thelastpickle.com
  • 8. The Platform & Consistency Strong Consistency (R + W > N) Eventual Consistency (R + W <= N)www.thelastpickle.com
  • 9. What Price Consistency? In a Multi DC cluster QUOURM and EACH_QUOURM involve cross DC latency. www.thelastpickle.com
  • 10. The Platform & Availability Maintain Consistency Level UP nodes for each Token Range. www.thelastpickle.com
  • 11. Best Case Failure with N=9 and RF 3, 100% Availability Replica 1 Replica 2 Replica 3 Range A www.thelastpickle.com
  • 12. Worst Case Failure with N=9 and RF 3, 78% Availability Range B Range A www.thelastpickle.com
  • 13. The Platform & PartitionTolerance A failed node does not create a partition. www.thelastpickle.com
  • 14. The Platform & PartitionTolerance www.thelastpickle.com
  • 15. The Platform & PartitionTolerance Partitions occur when the network fails. www.thelastpickle.com
  • 16. The Platform & PartitionTolerance www.thelastpickle.com
  • 17. The Storage Engine Optimised for Writes. www.thelastpickle.com
  • 18. Write Path Append to Write Ahead Log. (fsync every 10s by default, other options available) www.thelastpickle.com
  • 19. Write Path Merge new Columns into Memtable. (Lock free, always in memory.) www.thelastpickle.com
  • 20. Write Path... Later Asynchronously flush Memtable to a new SSTable on disk. (May be 10’s or 100’s of MB in size.) www.thelastpickle.com
  • 21. SSTable Files *-Data.db *-Index.db *-Filter.db (And others) www.thelastpickle.com
  • 22. Row Fragmentation SSTable 1 foo: dishwasher (ts 10): tomato purple (ts 10): cromulent SSTable 2 foo: frink (ts 20): flayven monkey (ts 10): embiggins SSTable 3 SSTable 4 foo: dishwasher (ts 15): tomacco SSTable 5 www.thelastpickle.com
  • 23. Read Path Read columns from each SSTable, then merge results. (Roughly speaking.) www.thelastpickle.com
  • 24. Read Path Use Bloom Filter to determine if a row key does not exist in a SSTable. (In memory) www.thelastpickle.com
  • 25. Read Path Search for prior key in *-Index.db sample. (In memory) www.thelastpickle.com
  • 26. Read Path Scan *-Index.db from prior key to find the search key and its’ *-Data.db offset. (On disk.) www.thelastpickle.com
  • 27. Read Path Read *-Data.db from offset, all columns or specific pages. www.thelastpickle.com
  • 28. Read purple, monkey, dishwasher SSTable 1-Data.db foo: dishwasher (ts 10): tomato purple (ts 10): cromulent SSTable 2-Data.db foo: frink (ts 20): flayven monkey (ts 10): embiggins SSTable 3-Data.db SSTable 4-Data.db foo: dishwasher (ts 15): tomacco SSTable 5-Data.db Bloom Filter Index Sample SSTable 1-Index.db Bloom Filter Index Sample SSTable 2-Index.db Bloom Filter Index Sample SSTable 3-Index.db Bloom Filter Index Sample SSTable 4-Index.db Bloom Filter Index Sample SSTable 5-Index.db Memory Disk www.thelastpickle.com
  • 29. Read With Key Cache SSTable 1-Data.db foo: dishwasher (ts 10): tomato purple (ts 10): cromulent SSTable 2-Data.db foo: frink (ts 20): flayven monkey (ts 10): embiggins SSTable 3-Data.db SSTable 4-Data.db foo: dishwasher (ts 15): tomacco SSTable 5-Data.db Key Cache Index Sample SSTable 1-Index.db Key Cache Index Sample SSTable 2-Index.db Key Cache Index Sample SSTable 3-Index.db Key Cache Index Sample SSTable 4-Index.db Key Cache Index Sample SSTable 5-Index.db Memory Disk Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter www.thelastpickle.com
  • 30. Read with Row Cache Row Cache SSTable 1-Data.db foo: dishwasher (ts 10): tomato purple (ts 10): cromulent SSTable 2-Data.db foo: frink (ts 20): flayven monkey (ts 10): embiggins SSTable 3-Data.db SSTable 4-Data.db foo: dishwasher (ts 15): tomacco SSTable 5-Data.db Key Cache Index Sample SSTable 1-Index.db Key Cache Index Sample SSTable 2-Index.db Key Cache Index Sample SSTable 3-Index.db Key Cache Index Sample SSTable 4-Index.db Key Cache Index Sample SSTable 5-Index.db Memory Disk Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter www.thelastpickle.com
  • 31. Performant Reads Design queries to read from a small number of SSTables. www.thelastpickle.com
  • 32. Performant Reads Read a small number of named columns or a slice of columns. www.thelastpickle.com
  • 33. Performant Reads Design data model to support current application requirements. www.thelastpickle.com
  • 34. Platform Tools Problems Maintenance www.thelastpickle.com
  • 35. Logging Configure via log4j-server.properties and StorageServiceMBean www.thelastpickle.com
  • 36. DEBUG Logging For One Class log4j.logger.org.apache.cassandra.thrift. CassandraServer=DEBUG www.thelastpickle.com
  • 37. Reading Logs INFO [OptionalTasks:1] 2013-04-20 14:03:50,787 MeteredFlusher.java (line 62) flushing high-traffic column family CFS(Keyspace='KS1', ColumnFamily='CF1') (estimated 403858136 bytes) INFO [OptionalTasks:1] 2013-04-20 14:03:50,787 ColumnFamilyStore.java (line 634) Enqueuing flush of Memtable- CF1@1333396270(145839277/403858136 serialized/live bytes, 1742365 ops) INFO [FlushWriter:42] 2013-04-20 14:03:50,788 Memtable.java (line 266) Writing Memtable-CF1@1333396270(145839277/403858136 serialized/live bytes, 1742365 ops) www.thelastpickle.com
  • 38. GC Logs cassandra-env.sh # GC logging options -- uncomment to enable # JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails" # JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps" # JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC" # JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution" # JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime" # JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure" # JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1" # JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date + %s`.log" www.thelastpickle.com
  • 39. ParNew GC Starting {Heap before GC invocations=224115 (full 111): par new generation total 873856K, used 717289K ...) eden space 699136K, 100% used ...) from space 174720K, 10% used ...) to space 174720K, 0% used ...) www.thelastpickle.com
  • 40. Tenuring Distribution 240217.053: [ParNew Desired survivor size 89456640 bytes, new threshold 4 (max 4) - age 1: 22575936 bytes, 22575936 total - age 2: 350616 bytes, 22926552 total - age 3: 4380888 bytes, 27307440 total - age 4: 1155104 bytes, 28462544 total www.thelastpickle.com
  • 41. ParNew GC Finishing Heap after GC invocations=224116 (full 111): par new generation total 873856K, used 31291K ...) eden space 699136K, 0% used ...) from space 174720K, 17% used ...) to space 174720K, 0% used ...) www.thelastpickle.com
  • 42. nodetool info Token : 0 Gossip active : true Load : 130.64 GB Generation No : 1369334297 Uptime (seconds) : 29438 Heap Memory (MB) : 3744.27 / 8025.38 Data Center : east Rack : rack1 Exceptions : 0 Key Cache : size 104857584 (bytes), capacity 104857584 (bytes), 25364985 hits, 34874180 requests, 0.734 recent hit rate, 14400 save period in seconds Row Cache : size 0 (bytes), capacity 0... www.thelastpickle.com
  • 43. nodetool ring Note: Ownership information does not include topology, please specify a keyspace. Address DC Rack Status State Load Owns Token 10.1.64.11 east rack1 Up Normal 130.64 GB 12.50% 0 10.1.65.8 west rack1 Up Normal 88.79 GB 0.00% 1 10.1.64.78 east rack1 Up Normal 52.66 GB 12.50% 212...216 10.1.65.181 west rack1 Up Normal 65.99 GB 0.00% 212...217 10.1.66.8 east rack1 Up Normal 64.38 GB 12.50% 425...432 10.1.65.178 west rack1 Up Normal 77.94 GB 0.00% 425...433 10.1.64.201 east rack1 Up Normal 56.42 GB 12.50% 638...648 10.1.65.59 west rack1 Up Normal 74.5 GB 0.00% 638...649 10.1.64.235 east rack1 Up Normal 79.68 GB 12.50% 850...864 10.1.65.16 west rack1 Up Normal 62.05 GB 0.00% 850...865 10.1.66.227 east rack1 Up Normal 106.73 GB 12.50% 106...080 10.1.65.226 west rack1 Up Normal 79.26 GB 0.00% 106...081 10.1.66.247 east rack1 Up Normal 66.68 GB 12.50% 127...295 10.1.65.19 west rack1 Up Normal 102.45 GB 0.00% 127...297 10.1.66.141 east rack1 Up Normal 53.72 GB 12.50% 148...512 10.1.65.253 west rack1 Up Normal 54.25 GB 0.00% 148...513 www.thelastpickle.com
  • 44. nodetool ring KS1 Address DC Rack Status State Load Effective-Ownership Token 10.1.64.11 east rack1 Up Normal 130.72 GB 12.50% 0 10.1.65.8 west rack1 Up Normal 88.81 GB 12.50% 1 10.1.64.78 east rack1 Up Normal 52.68 GB 12.50% 212...216 10.1.65.181 west rack1 Up Normal 66.01 GB 12.50% 212...217 10.1.66.8 east rack1 Up Normal 64.4 GB 12.50% 425...432 10.1.65.178 west rack1 Up Normal 77.96 GB 12.50% 425...433 10.1.64.201 east rack1 Up Normal 56.44 GB 12.50% 638...648 10.1.65.59 west rack1 Up Normal 74.57 GB 12.50% 638...649 10.1.64.235 east rack1 Up Normal 79.72 GB 12.50% 850...864 10.1.65.16 west rack1 Up Normal 62.12 GB 12.50% 850...865 10.1.66.227 east rack1 Up Normal 106.72 GB 12.50% 106...080 10.1.65.226 west rack1 Up Normal 79.28 GB 12.50% 106...081 10.1.66.247 east rack1 Up Normal 66.73 GB 12.50% 127...295 10.1.65.19 west rack1 Up Normal 102.47 GB 12.50% 127...297 10.1.66.141 east rack1 Up Normal 53.75 GB 12.50% 148...512 10.1.65.253 west rack1 Up Normal 54.24 GB 12.50% 148...513 www.thelastpickle.com
  • 45. nodetool status $ nodetool status Datacenter: ams01 (Replication Factor 3) ================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.70.48.23 38.38 GB 256 19.0% 7c5fdfad-63c6-4f37-bb9f-a66271aa3423 RAC1 UN 10.70.6.78 58.13 GB 256 18.3% 94e7f48f-d902-4d4a-9b87-81ccd6aa9e65 RAC1 UN 10.70.47.126 53.89 GB 256 19.4% f36f1f8c-1956-4850-8040-b58273277d83 RAC1 Datacenter: wdc01 (Replication Factor 3) ================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.24.116.66 65.81 GB 256 22.1% f9dba004-8c3d-4670-94a0-d301a9b775a8 RAC1 UN 10.55.104.90 63.31 GB 256 21.2% 4746f1bd-85e1-4071-ae5e-9c5baac79469 RAC1 UN 10.55.104.27 62.71 GB 256 21.2% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 RAC1 www.thelastpickle.com
  • 46. nodetool cfstats Keyspace: KS1 Column Family: CF1 SSTable count: 11 Space used (live): 32769179336 Space used (total): 32769179336 Number of Keys (estimate): 73728 Memtable Columns Count: 1069137 Memtable Data Size: 216442624 Memtable Switch Count: 3 Read Count: 95 Read Latency: NaN ms. Write Count: 1039417 Write Latency: 0.068 ms. Bloom Filter False Postives: 345 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 230096 Compacted row minimum size: 150 Compacted row maximum size: 322381140 Compacted row mean size: 2072156 www.thelastpickle.com
  • 47. nodetool cfhistograms $nodetool cfhistograms KS1 CF1 Offset SSTables Write Latency Read Latency Row Size Column Count 1 67264 0 0 0 1331591 2 19512 0 0 0 4241686 3 35529 0 0 0 474784 ... 10 10299 1150 0 0 21768 12 5475 3569 0 0 3993135 14 1986 9098 0 0 1434778 17 258 30916 0 0 366895 20 0 52980 0 0 186524 24 0 104463 0 0 25439063 ... 179 0 93 1823 1597 1284167 215 0 84 3880 1231655 1147150 258 0 170 5164 209282 956487 www.thelastpickle.com
  • 48. nodetool proxyhistograms $nodetool proxyhistograms Offset Read Latency Write Latency Range Latency 60 0 15 0 72 0 51 0 86 0 241 0 103 2 2003 0 124 9 5798 0 149 67 7348 0 179 222 6453 0 215 184 6071 0 258 134 5436 0 310 104 4936 0 372 89 4997 0 446 39 6383 0 535 76797 7518 0 642 9364748 96065 0 770 16406421 152663 0 924 7429538 97612 0 1109 6781835 176829 0 www.thelastpickle.com
  • 49. JMX via JConsole www.thelastpickle.com
  • 50. JMX via MX4J www.thelastpickle.com
  • 51. JMX via JMXTERM $ java -jar jmxterm-1.0-alpha-4-uber.jar Welcome to JMX terminal. Type "help" for available commands. $>open localhost:7199 #Connection to localhost:7199 is opened $>bean org.apache.cassandra.db:type=StorageService #bean is set to org.apache.cassandra.db:type=StorageService $>info #mbean = org.apache.cassandra.db:type=StorageService #class name = org.apache.cassandra.service.StorageService # attributes %0 - AllDataFileLocations ([Ljava.lang.String;, r) %1 - CommitLogLocation (java.lang.String, r) %2 - CompactionThroughputMbPerSec (int, rw) ... # operations %1 - void bulkLoad(java.lang.String p1) %2 - void clearSnapshot(java.lang.String p1,[Ljava.lang.String; p2) %3 - void decommission() www.thelastpickle.com
  • 52. JVM Heap Dump via JMAP jmap -dump:format=b, file=heap.bin pid www.thelastpickle.com
  • 53. JVM Heap Dump withYourKit www.thelastpickle.com
  • 54. Platform Tools Problems Maintenance www.thelastpickle.com
  • 55. Corrupt SSTable (Very rare.) www.thelastpickle.com
  • 56. Compaction Error ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java (line 132) Exception in thread Thread[CompactionExecutor:36,1,main] java.lang.RuntimeException: Last written key DecoratedKey(138024912283272996716128964353306009224, 6138633035613062      2d616666362d376330612d666531662d373738616630636265396535) >= current key DecoratedKey(127065377405949402743383718901402082101, 64323962636163652d646561372d333039322d386166322d663064346132363963386131) writing into *-tmp-hf-7372-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompaction Task.java:50) at org.apache.cassandra.db.compaction.CompactionManager $2.runMayThrow(CompactionManager.java:164) www.thelastpickle.com
  • 57. Cause Change in KeyValidator or bug in older versions. www.thelastpickle.com
  • 58. Fix nodetool scrub www.thelastpickle.com
  • 59. Dropped Messages www.thelastpickle.com
  • 60. Logs MessagingService.java (line 658) 173 READ messages dropped in last 5000ms StatusLogger.java (line 57) Pool Name Active Pending StatusLogger.java (line 72) ReadStage 32 284 StatusLogger.java (line 72) RequestResponseStage 1 254 StatusLogger.java (line 72) ReadRepairStage 0 0 www.thelastpickle.com
  • 61. nodetool tpstats Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 721 MUTATION 1262 REQUEST_RESPONSE 196 www.thelastpickle.com
  • 62. Causes Excessive GC. Overloaded IO. Overloaded Node. Wide Reads / Large Batches. www.thelastpickle.com
  • 63. High Read Latency www.thelastpickle.com
  • 64. nodetool info Token : 113427455640312814857969558651062452225 Gossip active : true Thrift active : true Load : 291.13 GB Generation No : 1368569510 Uptime (seconds) : 1022629 Heap Memory (MB) : 5213.01 / 8025.38 Data Center : 1 Rack : 20 Exceptions : 0 Key Cache : size 104857584 (bytes), capacity 104857584 (bytes), 13436862 hits, 16012159 requests, 0.907 recent hit rate, 14400 save period in seconds Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds www.thelastpickle.com
  • 65. nodetool cfstats Column Family: page_views SSTable count: 17 Space used (live): 289942843592 Space used (total): 289942843592 Number of Keys (estimate): 1071416832 Memtable Columns Count: 2041888 Memtable Data Size: 539015124 Memtable Switch Count: 83 Read Count: 267059 Read Latency: NaN ms. Write Count: 10516969 Write Latency: 0.054 ms. Pending Tasks: 0 Bloom Filter False Positives: 128586 Bloom Filter False Ratio: 0.00000 Bloom Filter Space Used: 802906184 Compacted row minimum size: 447 Compacted row maximum size: 3973 Compacted row mean size: 867 www.thelastpickle.com
  • 66. nodetool cfhistograms KS1 CF1 Offset SSTables Write Latency Read Latency Row Size Column Count 1 178437 0 0 0 0 2 20042 0 0 0 0 3 15275 0 0 0 0 4 11632 0 0 0 0 5 4771 0 0 0 0 6 4942 0 0 0 0 7 5540 0 0 0 0 8 4967 0 0 0 0 10 10682 0 0 0 284155 12 8355 0 0 0 15372508 14 1961 0 0 0 137959096 17 322 3 0 0 625733930 20 61 253 0 0 252953547 24 53 15114 0 0 39109718 29 18 255730 0 0 0 35 1 1532619 0 0 0 ... www.thelastpickle.com
  • 67. nodetool cfhistograms KS1 CF1 Offset SSTables Write Latency Read Latency Row Size Column Count 446 0 120 233 0 0 535 0 155 261 21361 0 642 0 127 284 19082720 0 770 0 88 218 498648801 0 924 0 86 2699 504702186 0 1109 0 22 3157 48714564 0 1331 0 18 2818 241091 0 1597 0 15 2155 2165 0 1916 0 19 2098 7 0 2299 0 10 1140 56 0 2759 0 10 1281 0 0 3311 0 6 1064 0 0 3973 0 4 676 3 0 ... www.thelastpickle.com
  • 68. jmx-term $ java -jar jmxterm-1.0-alpha-4-uber.jar  Welcome to JMX terminal. Type "help" for available commands. $>open localhost:7199 #Connection to localhost:7199 is opened $>bean org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies #bean is set to org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies $>get BloomFilterFalseRatio #mbean = org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies: BloomFilterFalseRatio = 0.5693801541828607; www.thelastpickle.com
  • 69. Back to cfstats Column Family: page_views Read Count: 270075 Bloom Filter False Positives: 131294 www.thelastpickle.com
  • 70. Cause bloom_filter_fp_chance had been set to 0.1 to reduce memory requirements when storing 1+ Billion rows per Node. www.thelastpickle.com
  • 71. Fix Changed read queries to select by column name to limit SSTables per query. Long term, migrate to Cassandra v1.2 for off heap Bloom Filters. www.thelastpickle.com
  • 72. GC Problems www.thelastpickle.com
  • 73. WARN WARN [ScheduledTasks:1] 2013-03-29 18:40:48,158 GCInspector.java (line 145) Heap is 0.9355130159566108 full. You may need to reduce memtable and/or cache sizes. INFO [ScheduledTasks:1] 2013-03-26 16:36:06,383 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 207 ms for 1 collections, 10105891032 used; max is 13591642112 INFO [ScheduledTasks:1] 2013-03-28 22:18:17,113 GCInspector.java (line 122) GC for ParNew: 256 ms for 1 collections, 6504905688 used; max is 13591642112 www.thelastpickle.com
  • 74. Serious GC Problems INFO [ScheduledTasks:1] 2013-04-30 23:21:11,959 GCInspector.java (line 122) GC for ParNew: 1115 ms for 1 collections, 9355247296 used; max is 12801015808 www.thelastpickle.com
  • 75. Flapping Node INFO [GossipTasks:1] 2013-03-28 17:42:07,944 Gossiper.java (line 830) InetAddress /10.1.20.144 is now dead. INFO [GossipStage:1] 2013-03-28 17:42:54,740 Gossiper.java (line 816) InetAddress /10.1.20.144 is now UP INFO [GossipTasks:1] 2013-03-28 17:46:00,585 Gossiper.java (line 830) InetAddress /10.1.20.144 is now dead. INFO [GossipStage:1] 2013-03-28 17:46:13,855 Gossiper.java (line 816) InetAddress /10.1.20.144 is now UP INFO [GossipStage:1] 2013-03-28 17:48:48,966 Gossiper.java (line 830) InetAddress /10.1.20.144 is now dead. www.thelastpickle.com
  • 76. “GC Problems are the result of workload and configuration.” Aaron Morton, Just Now. www.thelastpickle.com
  • 77. Workload Correlation? Look for wide rows, large writes, wide reads, un- bounded multi row reads or writes. www.thelastpickle.com
  • 78. Compaction Correlation? Slow down Compaction to improve stability. concurrent_compactors: 2 compaction_throughput_mb_per_sec: 8 in_memory_compaction_limit_in_mb: 32 (Monitor and reverse when resolved.) www.thelastpickle.com
  • 79. GC Logging Insights Slow down rate of tenuring and enable full GC logging. HEAP_NEWSIZE="1200M" JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4" www.thelastpickle.com
  • 80. GC’ing Objects in ParNew {Heap before GC invocations=7937 (full 205): par new generation total 1024000K, used 830755K ...) eden space 819200K, 100% used ...) from space 204800K, 5% used ...) to space 204800K, 0% used ...) Desired survivor size 104857600 bytes, new threshold 4 (max 4) - age 1: 8090240 bytes, 8090240 total - age 2: 565016 bytes, 8655256 total - age 3: 330152 bytes, 8985408 total - age 4: 657840 bytes, 9643248 total www.thelastpickle.com
  • 81. GC’ing Objects in ParNew {Heap before GC invocations=7938 (full 205): par new generation total 1024000K, used 835015K ...) eden space 819200K, 100% used ...) from space 204800K, 7% used ...) to space 204800K, 0% used ...) Desired survivor size 104857600 bytes, new threshold 4 (max 4) - age 1: 1315072 bytes, 1315072 total - age 2: 541072 bytes, 1856144 total - age 3: 499432 bytes, 2355576 total - age 4: 316808 bytes, 2672384 total www.thelastpickle.com
  • 82. Cause Nodes had wide rows & 1.3+ Billion rows and 3+GB of Bloom Filters. (Using older bloom_filter_fp_chance of 0.000744.) www.thelastpickle.com
  • 83. Fix Increased FP chance to 0.1 on one CF’s and .01 on others. (One CF reduced from 770MB to 170MB of Bloom Filters.) www.thelastpickle.com
  • 84. Fix Increased index_interval from 128 to 512. (Increased key_cache_size_in_mb to 200.) www.thelastpickle.com
  • 85. Fix MAX_HEAP_SIZE="8G" HEAP_NEWSIZE="1000M" -XX:SurvivorRatio=4" -XX:MaxTenuringThreshold=2" www.thelastpickle.com
  • 86. Platform Tools Problems Maintenance www.thelastpickle.com
  • 87. Maintenance Expand to Multi DC www.thelastpickle.com
  • 88. Expand to Multi DC Update Snitch Update Replication Strategy Add Nodes Update Replication Factor Rebuild www.thelastpickle.com
  • 89. DC Aware Snitch? SimpleSnitch puts all nodes in rack1 and datacenter1. www.thelastpickle.com
  • 90. More Snitches? PropertyFileSnitch RackInferringSnitch www.thelastpickle.com
  • 91. Gossip Based Snitch? Ec2Snitch Ec2MultiRegionSnitch GossipingPropertyFileSnitch* www.thelastpickle.com
  • 92. Changing the Snitch Do Not change the DC or Rack for an existing node. (Cassandra will not be able to find your data.) www.thelastpickle.com
  • 93. Moving to the GossipingPropertyFileSnitch Update cassandra- topology.properties on existing nodes with existing DC/Rack settings for all existing nodes. Set default to new DC. www.thelastpickle.com
  • 94. Moving to the GossipingPropertyFileSnitch Update cassandra- rackdc.properties on existing nodes with existing DC/Rack for the node. www.thelastpickle.com
  • 95. Moving to the GossipingPropertyFileSnitch Use a rolling restart to upgrade existing nodes to GossipingPropertyFileSnitch www.thelastpickle.com
  • 96. Expand to Multi DC Update Snitch Update Replication Strategy Add Nodes Update Replication Factor Rebuild www.thelastpickle.com
  • 97. Got NTS ? Must use NetworkTopologyStrategy for Multi DC deployments. www.thelastpickle.com
  • 98. SimpleStrategy Order Token Ranges. Start with range that contains Row Key. Count to RF. www.thelastpickle.com
  • 99. SimpleStrategy "foo" www.thelastpickle.com
  • 100. NetworkTopologyStrategy Order Token Ranges in the DC. Start with range that contains the Row Key. Add first unselected Token Range from each Rack. Repeat until RF selected. www.thelastpickle.com
  • 101. NetworkTopologyStrategy "foo" Rack 1 Rack 2Rack 3 www.thelastpickle.com
  • 102. NetworkTopologyStrategy & 1 Rack "foo" Rack 1 www.thelastpickle.com
  • 103. Changing the Replication Strategy Be Careful if existing configuration has multiple Racks. (Cassandra may not be able to find your data.) www.thelastpickle.com
  • 104. Changing the Replication Strategy Update Keyspace configuration to use NetworkTopologyStrategy with datacenter1:3 and new_dc:0. www.thelastpickle.com
  • 105. PreparingThe Client Disable auto node discovery or use DC aware methods. Use LOCAL_QUOURM or EACH_QUOURM. www.thelastpickle.com
  • 106. Expand to Multi DC Update Snitch Update Replication Strategy Add Nodes Update Replication Factor Rebuild www.thelastpickle.com
  • 107. Configuring New Nodes Add auto_bootstrap: false to cassandra.yaml. Use GossipingPropertyFileSnitch. Three Seeds from each DC. (Use cluster_name as a safety.) www.thelastpickle.com
  • 108. Configuring New Nodes Update cassandra- rackdc.properties on new nodes with new DC/Rack for the node. (Ignore cassandra-topology.properties) www.thelastpickle.com
  • 109. StartThe New Nodes New Nodes in the Ring in the new DC without data or traffic. www.thelastpickle.com
  • 110. Expand to Multi DC Update Snitch Update Replication Strategy Add Nodes Update Replication Factor Rebuild www.thelastpickle.com
  • 111. Change the Replication Factor Update Keyspace configuration to use NetworkTopologyStrategy with dataceter1:3 and new_dc:3. www.thelastpickle.com
  • 112. Change the Replication Factor New DC nodes will start receiving writes from old DC coordinators. www.thelastpickle.com
  • 113. Expand to Multi DC Update Snitch Update Replication Strategy Add Nodes Update Replication Factor Rebuild www.thelastpickle.com
  • 114. Y U No Bootstrap? DC 1 DC 2 www.thelastpickle.com
  • 115. nodetool rebuild DC1 DC 1 DC 2 www.thelastpickle.com
  • 116. Rebuild Complete New Nodes now performing Strong Consistency reads. (If EACH_QUOURM used for writes.) www.thelastpickle.com
  • 117. Summary Relax. Understand the Platform and the Tools. Always maintain Availability. www.thelastpickle.com
  • 118. Thanks. www.thelastpickle.com
  • 119. Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License