Manage your compactions before they manage you!
About Pythian
18 Years of Data
infrastructure
•management consulting
•200+ Top brands
•6000+ databases under
•management
•Over 300 DBA’s, in 29
countries
•Top 5% of DBA work force, 9
•Oracle ACE’s, 2 Microsoft
MVP’ 2© 2015. All Rights Reserved.
About Me
•Cassandra Consultant
–First contact was 0.8
•Cassandra MVP & Datastax
Certified Architect
•Lisbon Cassandra Meetup
•Passion for distributed
systems
•Loves a good challenge
•Waterpolo is my sport
•@cjrolo
3© 2015. All Rights Reserved.
1 Why Compact
2 Compaction strategies and tuning
3 General tuning
4 Take aways, compacted
5 Q&A
4© 2015. All Rights Reserved.
Compaction Impact
•High I/O usage
•Temporary increase on disk space
•High CPU usage
•Increase latency on operations!
5© 2015. All Rights Reserved.
Commonly heard
•"My system is compacting 100% of the time"
•"All disk I/O is used by compaction"
•"Compaction is far behind"
•"Hundreds or thousands of SSTables!"
6© 2015. All Rights Reserved.
•Quick recap on the write path
–Commitlog -> Memtable -> SSTable
–SSTables are immutable
Why do we compact?
7© 2015. All Rights Reserved.
•Tombstones
•Row duplicates
•Rows spread across multiple SSTables
•Consolidation of data is imperative
Why do we compact? (2)
8© 2015. All Rights Reserved.
Size-tiered compaction
•The original compaction type!
•"This strategy triggers a minor compaction when there are a
number of similar sized SSTables on disk as configured by the
table subproperty, min_threshold. A minor compaction does not
involve all the tables in a keyspace"
•When to use: write once data, write heavy scenarios, limited I/O
available
9© 2015. All Rights Reserved.
Tuning size-tiered
•CQL Properties:
–bucket_high
–bucket_low
–cold_reads_to_omit
–max_threshold
–min_threshold
–min_sstable_size
10© 2015. All Rights Reserved.
Levelled compaction
•Strategy appeared in Cassandra 1.0.
•"The leveled compaction strategy creates SSTables of a fixed,
relatively small size (160 MB by default) that are grouped into
levels. Within each level, SSTables are guaranteed to be non-
overlapping. Each level (L0, L1, L2 and so on) is 10 times as large
as the previous."
•When to use: Random reads, Reads that are latency sensitive,
highly updated rows
11© 2015. All Rights Reserved.
Tuning levelled compaction
12© 2015. All Rights Reserved.
•CQL Properties:
–sstable_size_in_mb
Time-series compaction
13© 2015. All Rights Reserved.
•Available since Cassandra 2.0.11, 2.1.1
•"DateTieredCompactionStrategy stores data written within a
certain period of time in the same SSTable"
•When to use? Time series data!
•Time-Series Tuning
14© 2015. All Rights Reserved.
•CQL Properties:
–base_time_seconds
–max_sstable_age_days
–max_threshold
–min_threshold
–timestamp_resolution
DTCS Quirks...
•Out-of-order data
–Hints
–Clients not in sync
–Repairs
–Someone inserted out-of-order data...
15© 2015. All Rights Reserved.
Monitoring Compaction
•CFStats
•Nodetool Compactionstats
•CompactionManagerMBean:
–CompletedTasks: Number of completed compactions since the last start of this Cassandra
instance
–PendingTasks: Number of estimated tasks remaining to perform
–ColumnFamilyInProgress: The table currently being compacted.
–BytesTotalInProgress: Total number of data bytes (index and filter are not included) being
compacted.
–BytesCompacted: The progress of the current compaction.
•Strace
•iotop
16© 2015. All Rights Reserved.
Disk Tuning
•Compaction means large I/O
•Big RAID stripes
•SSDs!!
•Dedicated non-striped disks
•No SAN/NAS
•I/O scheduler can have some impact
•Some linux settings can be used for emergencies.
17© 2015. All Rights Reserved.
Take Aways, compacted
•What did we learn here...
–Selecting the proper compaction strategy can improve your cluster
performance
•Doing the opposite can create serious issues...
–Monitor your compactions!
–You can try compactions strategies out without changing your tables!
18© 2015. All Rights Reserved.
Q&A
•Thanks for listening!
•Questions?
19© 2015. All Rights Reserved.
Thank you

Manage your compactions before they manage you!

  • 1.
    Manage your compactionsbefore they manage you!
  • 2.
    About Pythian 18 Yearsof Data infrastructure •management consulting •200+ Top brands •6000+ databases under •management •Over 300 DBA’s, in 29 countries •Top 5% of DBA work force, 9 •Oracle ACE’s, 2 Microsoft MVP’ 2© 2015. All Rights Reserved.
  • 3.
    About Me •Cassandra Consultant –Firstcontact was 0.8 •Cassandra MVP & Datastax Certified Architect •Lisbon Cassandra Meetup •Passion for distributed systems •Loves a good challenge •Waterpolo is my sport •@cjrolo 3© 2015. All Rights Reserved.
  • 4.
    1 Why Compact 2Compaction strategies and tuning 3 General tuning 4 Take aways, compacted 5 Q&A 4© 2015. All Rights Reserved.
  • 5.
    Compaction Impact •High I/Ousage •Temporary increase on disk space •High CPU usage •Increase latency on operations! 5© 2015. All Rights Reserved.
  • 6.
    Commonly heard •"My systemis compacting 100% of the time" •"All disk I/O is used by compaction" •"Compaction is far behind" •"Hundreds or thousands of SSTables!" 6© 2015. All Rights Reserved.
  • 7.
    •Quick recap onthe write path –Commitlog -> Memtable -> SSTable –SSTables are immutable Why do we compact? 7© 2015. All Rights Reserved.
  • 8.
    •Tombstones •Row duplicates •Rows spreadacross multiple SSTables •Consolidation of data is imperative Why do we compact? (2) 8© 2015. All Rights Reserved.
  • 9.
    Size-tiered compaction •The originalcompaction type! •"This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace" •When to use: write once data, write heavy scenarios, limited I/O available 9© 2015. All Rights Reserved.
  • 10.
  • 11.
    Levelled compaction •Strategy appearedin Cassandra 1.0. •"The leveled compaction strategy creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non- overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous." •When to use: Random reads, Reads that are latency sensitive, highly updated rows 11© 2015. All Rights Reserved.
  • 12.
    Tuning levelled compaction 12©2015. All Rights Reserved. •CQL Properties: –sstable_size_in_mb
  • 13.
    Time-series compaction 13© 2015.All Rights Reserved. •Available since Cassandra 2.0.11, 2.1.1 •"DateTieredCompactionStrategy stores data written within a certain period of time in the same SSTable" •When to use? Time series data!
  • 14.
    •Time-Series Tuning 14© 2015.All Rights Reserved. •CQL Properties: –base_time_seconds –max_sstable_age_days –max_threshold –min_threshold –timestamp_resolution
  • 15.
    DTCS Quirks... •Out-of-order data –Hints –Clientsnot in sync –Repairs –Someone inserted out-of-order data... 15© 2015. All Rights Reserved.
  • 16.
    Monitoring Compaction •CFStats •Nodetool Compactionstats •CompactionManagerMBean: –CompletedTasks:Number of completed compactions since the last start of this Cassandra instance –PendingTasks: Number of estimated tasks remaining to perform –ColumnFamilyInProgress: The table currently being compacted. –BytesTotalInProgress: Total number of data bytes (index and filter are not included) being compacted. –BytesCompacted: The progress of the current compaction. •Strace •iotop 16© 2015. All Rights Reserved.
  • 17.
    Disk Tuning •Compaction meanslarge I/O •Big RAID stripes •SSDs!! •Dedicated non-striped disks •No SAN/NAS •I/O scheduler can have some impact •Some linux settings can be used for emergencies. 17© 2015. All Rights Reserved.
  • 18.
    Take Aways, compacted •Whatdid we learn here... –Selecting the proper compaction strategy can improve your cluster performance •Doing the opposite can create serious issues... –Monitor your compactions! –You can try compactions strategies out without changing your tables! 18© 2015. All Rights Reserved.
  • 19.
  • 20.