SlideShare a Scribd company logo
1 of 38
ALTER TABLE WITH
Understanding Cassandra’s Table Options
A Little About Me
• Cassandra in production since 2010
• Infrastructure @ Crowdstrike
• Hundreds of terabytes in Cassandra
• Occasional code contributions
• Cassandra MVP
• Cassandra Day LA: 5 years of Hindsight
• Cassandra Summit 2015: DTCS is Broken (unofficial title)
An Introduction
to CrowdStrike
We Are CyberSecurity Technology Company
We Detect, Prevent And Respond To All Attack
Types In Real Time, Protecting Organizations
From Catastrophic Breaches
We Provide Next Generation Endpoint Protection,
Threat Intelligence & Pre &Post IR Services
A Little About Tonight
• Cassandra Write paths
• Cassandra Read paths
• Knowing what Cassandra is doing helps you understand how
to tune
• It’s not just about performance, it’s also about latencies,
stability, and correctness
• Feel free to interrupt me! Ask questions before, during, after
Write Path, Simplified
• Writes first go to the commitlog
• Then, memtable
• Then, eventually flushed to sstables
• If RF > ONE, the coordinator sends the mutation to replicas
• Depending on CL, the coordinator waits until enough respond before reporting
success to the client
Write Path, Simplified
• Writes first go to the commitlog
- Append only journal
- Replayed on node startup
- Purged once the node knows that all relevant data is written into sstables (nodetool flush)
- If you use spinning disks, append-only model avoids seeks (as long as commitlog is on its own
partition)
Write Path, Simplified
• Then the memtable
- Effectively a write-back cache of rows as they’re written
- Once a row is written to the memtable, the mutation can be counted towards the CONSISTENCY
LEVEL of the query
- Writes are batched in the memtable until it’s ready to flush
Write Path, Simplified
• Then flushed to sstables
- At specified thresholds ( memtable_(off)heap_space_in_mb * memtable_cleanup_threshold ), the
memtable is flushed to disk
- Each sstable is written exactly one time - never changed once it’s written
- If a new write comes in for the same value, it’s written to a new sstable with a new timestamp
Table Option #1: Compaction Strategy
• If tables are never re-written, how do updates and deletions work? Compaction! Multiple sstables
are joined together, duplicate cells are merged, deleted data is purged (eventually)
• Each table specifies a compaction strategy. Cassandra ships with 3 by default
• SizeTieredCompactionStrategy is the oldest, most mature, tuned for writes
• LeveledCompactionStrategy is tuned for read latency
• DateTieredCompactionStrategy is meant for time series, TTL heavy workloads
Table Option #1: Compaction Strategy
• SizeTieredCompactionStrategy
• Every time min_threshold (4) files of the same size appear, combine them together
Table Option #1: Compaction Strategy
• SizeTieredCompactionStrategy
• Every time min_threshold (4) files of the same size appear, combine them together
• Over time, older data naturally ends up in larger files
Table Option #1: Compaction Strategy
Table Option #1: Compaction Strategy
• SizeTieredCompactionStrategy Advantages
• Minimizes write amplification
• Very easy to reason about
• Simple algorithm, so unlikely to cause extra CPU/memory usage at flush time
• Flushing is important – complicated compaction strategies that block flushing can be bad (if the
memtable fills before it flushes, stop accepting writes)
Table Option #1: Compaction Strategy
• SizeTieredCompactionStrategy Disadvantages
• Deleted data from old files may not be compacted away for a very long time
• Frequently changed cells will live in many files, and must be merged on read
• Read queries may touch a number of files, which is SLOW
Table Option #1: Compaction Strategy
• LeveledCompactionStrategy
• Spends extra effort compacting sstables to ensure that each row exists in at most one sstable per
‘level’
• Expected probability for number of sstables per read: ~1.11
• Advantage: lower read latency
• Disadvantage: much more IO required
• Typically advantageous when you:
 Read much more than you write
 Highly sensitive to read latency
 Rows change over time (values updated, or values expire)
• Prefer STCS if:
 You can’t spare the IO
 Rows are write-once
 You write far more than you read
Table Option #1: Compaction Strategy
• DateTieredCompactionStrategy
• Designed for time series, often TTL heavy workloads
• Assumes writes come in order
• Tries to group sstables by date
• Great in theory
Table Option #1: Compaction Strategy
Table Option #1: Compaction Strategy
• Takeaway: Choosing the right compaction strategy not only impacts latency, but IO/CPU, and can
have a huge impact on disk space if you use TTLS
Read Path
1. Find the right server using the partition key and partition function (probably murmur3)
2. Find the sstables on disk that contain the row in question
3. Find the partition offset in the data files (use cache if possible, otherwise use the partition index
data)
4. The data is then read from the appropriate file
5. Duplicate cells are merged with timestamp resolution (last write wins)
6. If CL > ONE, the coordinator checks multiple replicas, and repairs any that are incorrect
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #2: Bloom Filters
• Off-heap data structure that tells Cassandra that the row either “might” or “does not” exist in a
given data file
• Probabilistic: bloom_filter_fp_chance
• Defaults to 0.01 on STCS, 0.1 on LCS (LCS already defragments, so false positives are less
costly)
• Cost: RAM (offheap) – 0.01 uses approximately 3x the memory as 0.1
• Tuning: Adjust based on RAM available and number of sstables.
• For slow disks or lots of sstables, lower fp chance to decrease disk IO
• If you’re memory starved and have few sstables on disk, raise the fp chance and use the RAM
for page cache
• WITH bloom_filter_fp_chance=0.01
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #3: Key Cache
• There’s a row cache – don’t use it
• The key cache helps find the data in the sstable quickly
• If you set the key cache low, there’s a good chance the OS page cache will help, but key cache
will be faster
• WITH caching = {‘keys’: ‘ALL’, ‘rows_per_partition’: ‘NONE’}
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #4: Partition Summary / Index
• Maps row key to offset in data file
• It’s not every row key – it’s a sorted sampling
• You can tune the sample parameters: max_index_interval , min_index_interval
• Cassandra will adapt sample based on sstable read hotness – more frequently read tables will
get a more dense index for more accurate locations on disk
• Again, primarily a RAM tradeoff – lower interval = more RAM = less IO
• WITH max_index_interval 2048 AND min_index_interval = 128
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #5: Compression
• The sstable is compressed chunk-by-chunk as it’s written (either during flush or compaction)
• Compression offsets are mapped like index offsets
• Larger chunks typically means better compression ratios for most data sets
• Smaller chunks means that if you do go to disk, you have less over-read
• Very literal tradeoff between disk IO and storage capacity – larger chunks = better ratios, but you
may have to read larger chunks off the disk when it’s not cached in RAM
• Data size dependent: 64k read for 500 bytes of data may severely limit your read performance
• WITH compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #6: Correctness (CRC)
• Compressed tables have a checksum embedded in the compression data
• Cassandra can verify that checksum on decompression, IF you want
• WITH crc_check_chance = 1.0
• Uncompressed files have NO CORRECTNESS VALIDATION in the read path – if you have disk
based bit rot, Cassandra won’t know unless you run manual sstable verify (nodetool verify)
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #7: Clustering
• Each partition is written once in the file
• Values in the partition are sorted based on clustering order
• In CQL3 terms, this means clustering key values:
• Because records are sorted when written, retrieving a range of clustering keys is incredibly fast
(nearly free)
• Normal sort order is ascending! If you need descending, flip the order in the schema so the read
path can do a single linear pass:
• WITH CLUSTERING ORDER BY (sensor_reading_timestamp DESC)
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #8: Correctness (Read Repair)
• Depending on your consistency level, the coordinator will ask multiple replicas for the data
• One will return the data; others will return a digest
• If the digest doesn’t match the data, the coordinator will choose the value with the highest
timestamp, and make sure all replicas have that value – you can not disable this type of read
repair, except by querying with CL:ONE
• If the digest does match for the replicas returned, but you’re using CL < ALL, you can have
cassandra read-repair that cell anyway just in case:
• WITH dclocal_read_repair_chance = 0.01 AND read_repair_chance = 0.0
Read Path
https://academy.datastax.com/demos/brief-introduction-apache-cassandra
Table Option #9: Avoiding Timeouts
• Typical Cassandra use cases have RF > 1
• You may ask for data from X nodes, where X < RF
• If one of those nodes is slow to respond (query load, compaction load, JVM GC), Cassandra can
try other replicas before waiting for the full 10s timeout
• “Speculative Retry” is configurable based on logical time limits, like 99% latencies
• WITH speculative_retry = '99PERCENTILE’
• Watch out: speculative retry may violate LOCAL_ datacenter consistency levels (for now)
Lots of Options, Lots of Flexibility
• Choose compaction based on write / read PATTERNS
• Choose bloom filter FP chance based on read latency and memory available
• Enable the key cache, but probably not the row cache
• You can tune the index interval if you have really hot and really cold sstables
• Compression chunk size can control how much data you read off of the disk at a time, or how
well your data compresses
• Compression gives you CRCs to guard against corruption, and you can tune whether or not
they’re used
• SSTables are inherently sorted, use clustering order options as it fits your data
• Foreground read repair can’t be disabled, but background read repair can be used to help speed
up ‘eventual’ consistency
• Speculative retry may can help avoid timeouts and/or drop your 99.9% latencies
That’s it!
• You can talk to me about Cassandra on Twitter ( @jjirsa )
• There’s an active Cassandra community in IRC: irc.freenode.net #cassandra
• Crowdstrike is hiring: www.crowdstrike.com/careers/
• Huge thanks to Datastax and Hulu for making this meetup happen!

More Related Content

Viewers also liked

Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...DataStax
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...DataStax
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...DataStax
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016DataStax
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016DataStax
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...DataStax
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APISimran Kedia
 

Viewers also liked (12)

Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
There are More Clouds! Azure and Cassandra (Carlos Rolo, Pythian) | C* Summit...
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
 
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
Using Approximate Data for Small, Insightful Analytics (Ben Kornmeier, Protec...
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
Webinar - Macy’s: Why Your Database Decision Directly Impacts Customer Experi...
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
 

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

ALTER TABLE WITH: Understanding Cassandra Table Options

  • 1. ALTER TABLE WITH Understanding Cassandra’s Table Options
  • 2. A Little About Me • Cassandra in production since 2010 • Infrastructure @ Crowdstrike • Hundreds of terabytes in Cassandra • Occasional code contributions • Cassandra MVP • Cassandra Day LA: 5 years of Hindsight • Cassandra Summit 2015: DTCS is Broken (unofficial title)
  • 3. An Introduction to CrowdStrike We Are CyberSecurity Technology Company We Detect, Prevent And Respond To All Attack Types In Real Time, Protecting Organizations From Catastrophic Breaches We Provide Next Generation Endpoint Protection, Threat Intelligence & Pre &Post IR Services
  • 4. A Little About Tonight • Cassandra Write paths • Cassandra Read paths • Knowing what Cassandra is doing helps you understand how to tune • It’s not just about performance, it’s also about latencies, stability, and correctness • Feel free to interrupt me! Ask questions before, during, after
  • 5. Write Path, Simplified • Writes first go to the commitlog • Then, memtable • Then, eventually flushed to sstables • If RF > ONE, the coordinator sends the mutation to replicas • Depending on CL, the coordinator waits until enough respond before reporting success to the client
  • 6. Write Path, Simplified • Writes first go to the commitlog - Append only journal - Replayed on node startup - Purged once the node knows that all relevant data is written into sstables (nodetool flush) - If you use spinning disks, append-only model avoids seeks (as long as commitlog is on its own partition)
  • 7. Write Path, Simplified • Then the memtable - Effectively a write-back cache of rows as they’re written - Once a row is written to the memtable, the mutation can be counted towards the CONSISTENCY LEVEL of the query - Writes are batched in the memtable until it’s ready to flush
  • 8. Write Path, Simplified • Then flushed to sstables - At specified thresholds ( memtable_(off)heap_space_in_mb * memtable_cleanup_threshold ), the memtable is flushed to disk - Each sstable is written exactly one time - never changed once it’s written - If a new write comes in for the same value, it’s written to a new sstable with a new timestamp
  • 9. Table Option #1: Compaction Strategy • If tables are never re-written, how do updates and deletions work? Compaction! Multiple sstables are joined together, duplicate cells are merged, deleted data is purged (eventually) • Each table specifies a compaction strategy. Cassandra ships with 3 by default • SizeTieredCompactionStrategy is the oldest, most mature, tuned for writes • LeveledCompactionStrategy is tuned for read latency • DateTieredCompactionStrategy is meant for time series, TTL heavy workloads
  • 10. Table Option #1: Compaction Strategy • SizeTieredCompactionStrategy • Every time min_threshold (4) files of the same size appear, combine them together
  • 11. Table Option #1: Compaction Strategy • SizeTieredCompactionStrategy • Every time min_threshold (4) files of the same size appear, combine them together • Over time, older data naturally ends up in larger files
  • 12. Table Option #1: Compaction Strategy
  • 13. Table Option #1: Compaction Strategy • SizeTieredCompactionStrategy Advantages • Minimizes write amplification • Very easy to reason about • Simple algorithm, so unlikely to cause extra CPU/memory usage at flush time • Flushing is important – complicated compaction strategies that block flushing can be bad (if the memtable fills before it flushes, stop accepting writes)
  • 14. Table Option #1: Compaction Strategy • SizeTieredCompactionStrategy Disadvantages • Deleted data from old files may not be compacted away for a very long time • Frequently changed cells will live in many files, and must be merged on read • Read queries may touch a number of files, which is SLOW
  • 15. Table Option #1: Compaction Strategy • LeveledCompactionStrategy • Spends extra effort compacting sstables to ensure that each row exists in at most one sstable per ‘level’ • Expected probability for number of sstables per read: ~1.11 • Advantage: lower read latency • Disadvantage: much more IO required • Typically advantageous when you:  Read much more than you write  Highly sensitive to read latency  Rows change over time (values updated, or values expire) • Prefer STCS if:  You can’t spare the IO  Rows are write-once  You write far more than you read
  • 16. Table Option #1: Compaction Strategy • DateTieredCompactionStrategy • Designed for time series, often TTL heavy workloads • Assumes writes come in order • Tries to group sstables by date • Great in theory
  • 17. Table Option #1: Compaction Strategy
  • 18. Table Option #1: Compaction Strategy • Takeaway: Choosing the right compaction strategy not only impacts latency, but IO/CPU, and can have a huge impact on disk space if you use TTLS
  • 19. Read Path 1. Find the right server using the partition key and partition function (probably murmur3) 2. Find the sstables on disk that contain the row in question 3. Find the partition offset in the data files (use cache if possible, otherwise use the partition index data) 4. The data is then read from the appropriate file 5. Duplicate cells are merged with timestamp resolution (last write wins) 6. If CL > ONE, the coordinator checks multiple replicas, and repairs any that are incorrect
  • 22. Table Option #2: Bloom Filters • Off-heap data structure that tells Cassandra that the row either “might” or “does not” exist in a given data file • Probabilistic: bloom_filter_fp_chance • Defaults to 0.01 on STCS, 0.1 on LCS (LCS already defragments, so false positives are less costly) • Cost: RAM (offheap) – 0.01 uses approximately 3x the memory as 0.1 • Tuning: Adjust based on RAM available and number of sstables. • For slow disks or lots of sstables, lower fp chance to decrease disk IO • If you’re memory starved and have few sstables on disk, raise the fp chance and use the RAM for page cache • WITH bloom_filter_fp_chance=0.01
  • 24. Table Option #3: Key Cache • There’s a row cache – don’t use it • The key cache helps find the data in the sstable quickly • If you set the key cache low, there’s a good chance the OS page cache will help, but key cache will be faster • WITH caching = {‘keys’: ‘ALL’, ‘rows_per_partition’: ‘NONE’}
  • 26. Table Option #4: Partition Summary / Index • Maps row key to offset in data file • It’s not every row key – it’s a sorted sampling • You can tune the sample parameters: max_index_interval , min_index_interval • Cassandra will adapt sample based on sstable read hotness – more frequently read tables will get a more dense index for more accurate locations on disk • Again, primarily a RAM tradeoff – lower interval = more RAM = less IO • WITH max_index_interval 2048 AND min_index_interval = 128
  • 28. Table Option #5: Compression • The sstable is compressed chunk-by-chunk as it’s written (either during flush or compaction) • Compression offsets are mapped like index offsets • Larger chunks typically means better compression ratios for most data sets • Smaller chunks means that if you do go to disk, you have less over-read • Very literal tradeoff between disk IO and storage capacity – larger chunks = better ratios, but you may have to read larger chunks off the disk when it’s not cached in RAM • Data size dependent: 64k read for 500 bytes of data may severely limit your read performance • WITH compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
  • 30. Table Option #6: Correctness (CRC) • Compressed tables have a checksum embedded in the compression data • Cassandra can verify that checksum on decompression, IF you want • WITH crc_check_chance = 1.0 • Uncompressed files have NO CORRECTNESS VALIDATION in the read path – if you have disk based bit rot, Cassandra won’t know unless you run manual sstable verify (nodetool verify)
  • 32. Table Option #7: Clustering • Each partition is written once in the file • Values in the partition are sorted based on clustering order • In CQL3 terms, this means clustering key values: • Because records are sorted when written, retrieving a range of clustering keys is incredibly fast (nearly free) • Normal sort order is ascending! If you need descending, flip the order in the schema so the read path can do a single linear pass: • WITH CLUSTERING ORDER BY (sensor_reading_timestamp DESC)
  • 34. Table Option #8: Correctness (Read Repair) • Depending on your consistency level, the coordinator will ask multiple replicas for the data • One will return the data; others will return a digest • If the digest doesn’t match the data, the coordinator will choose the value with the highest timestamp, and make sure all replicas have that value – you can not disable this type of read repair, except by querying with CL:ONE • If the digest does match for the replicas returned, but you’re using CL < ALL, you can have cassandra read-repair that cell anyway just in case: • WITH dclocal_read_repair_chance = 0.01 AND read_repair_chance = 0.0
  • 36. Table Option #9: Avoiding Timeouts • Typical Cassandra use cases have RF > 1 • You may ask for data from X nodes, where X < RF • If one of those nodes is slow to respond (query load, compaction load, JVM GC), Cassandra can try other replicas before waiting for the full 10s timeout • “Speculative Retry” is configurable based on logical time limits, like 99% latencies • WITH speculative_retry = '99PERCENTILE’ • Watch out: speculative retry may violate LOCAL_ datacenter consistency levels (for now)
  • 37. Lots of Options, Lots of Flexibility • Choose compaction based on write / read PATTERNS • Choose bloom filter FP chance based on read latency and memory available • Enable the key cache, but probably not the row cache • You can tune the index interval if you have really hot and really cold sstables • Compression chunk size can control how much data you read off of the disk at a time, or how well your data compresses • Compression gives you CRCs to guard against corruption, and you can tune whether or not they’re used • SSTables are inherently sorted, use clustering order options as it fits your data • Foreground read repair can’t be disabled, but background read repair can be used to help speed up ‘eventual’ consistency • Speculative retry may can help avoid timeouts and/or drop your 99.9% latencies
  • 38. That’s it! • You can talk to me about Cassandra on Twitter ( @jjirsa ) • There’s an active Cassandra community in IRC: irc.freenode.net #cassandra • Crowdstrike is hiring: www.crowdstrike.com/careers/ • Huge thanks to Datastax and Hulu for making this meetup happen!