Cassandra Anti-Patterns

•

12 likes•4,877 views

Matthew Dennis

short lighting talk on Apache Cassandra Anti-Patterns from Cassandra SF 2011

Technology Entertainment & Humor

Cassandra Anti-Patterns (in 5m)
Matthew F. Dennis // @mdennis

Non-Sun (err, Non-Oracle) JVM
● No OpenJDK
● No Blackdown (anyone still use this?)
● Etc, etc, etc; just use the Sun (Oracle) JVM
● At least u22, but in general the latest release
(unless you have specific reasons otherwise)

CommitLog+Data On The Same Disk
● Don't put the commit log and data directories on
the same set of spindles
– commit log gets a single spindle entirely to itself (standard
consumer SATA disks easily sustain > 80 MB/s in
sequential writes)
● DOES NOT APPLY TO SSDS or EC2
● SSDs have no seek time
● EC2 ephemeral drives are still virtualized (but not the
same as EBS)
● On EC2 or SSDs: use one RAID set for both the
commit log and data directories

EBS volumes on EC2
● Sounds great, nice feature set, but …
● Not predictable
● “freezes” are common
● Throughput limited in many cases
● Use ephemeral drives instead
● Stripe them
● Both commit log and data directory on the same
raid set

Oversized JVM heaps
● 6 – 8 GB is good (assuming sufficient ram on
your boxen)
● 10 – 12 GB is possible and in some
circumstances “correct”
● 16GB == max JVM heap size
● > 16GB => badness
● JVM heap ~= boxen RAM => badness (always)

JVM heap size -v- GC suckage

~16GB
GC Suckage

~10GB

~6GB

JVM heap size

Large batch mutations
(large in number of distinct rows)

● Timeout / failure => entire mutation must be
retried => wasted work
● Larger mutations => higher likely hood of
timehood
● 1000 mutations to perform? Do 100 batches of
10 in parallel instead of one batch of 1000
● Exact number or rows/batch is variable
depending on HW, network, load, etc;
experiment! (10-100 is a good starting point)

OPP / BOP partitioner
● You probably shouldn't use it
● No really, you almost certainly shouldn't use it
● Creates hot spots
● Requires “baby sitting” from ops
● Not as well tested nor is it widely deployed

C* auto selection of tokens
● Always specify your initial token.
● Auto select doesn't do what you think it does
nor does it do what you want
– loadbalance is even worse, it doesn't currently do what
you think, what you want or what it claims; “F#@* my
cluster” would be a much more apt name than
“loadbalance”
– Future (next?) release of OPSC will remove your
balancing woes

Super Columns
● 10 – 15 percent performance penalty on reads and writes
● Easier / better to use to composite columns
– 0.8.x makes this a lot easier
– Done manually in 0.7.x and is still better
● Devs working in C* code despise (loathe?) them
● API probably won't be deprecated, but implementation will be
replaced behind the seen with composites (may be “ok” at that point
to use them, but should probably just use composite API direclty)
● Cassandra and DataStax is committed to maintain the API going
forward, even if the implementation changes

Read Before Write
● Race conditions
● Abuses/Thrashes cache (row, key and page)
● Increases latency
● Increases IO requirements (by a lot)
● Increases size in the client

Winblows
● Try to avoid it, you'll be happier
– Not always possible? Then, “I'm sorry for your pain”
● Run 'nix (in particular, probably Linux)
● Easier to get help (IRC, email, meetups, etc)
● C* performs better
● Better tested
● Cheaper
● Wider deployed (by a lot)

Q?
Cassandra Anti-Patterns
Matthew F. Dennis // @mdennis

What's hot

Cassandra NYC 2011 Data ModelingMatthew Dennis

Shootout at the AWS CorralPostgreSQL Experts, Inc.

92 grand prix_2013PostgreSQL Experts, Inc.

Seastore: Next Generation Backing Store for CephScyllaDB

Shootout at the PAAS CorralPostgreSQL Experts, Inc.

Speeding up Page Load Times by Using StarlingErik Osterman

MySQL Performance - SydPHP October 2011Graham Weldon

Ndb cluster 80_ycsb_memmikaelronstrom

Unikraft: Fast, Specialized Unikernels the Easy WayScyllaDB

Avoiding Data Hotspots at ScaleScyllaDB

ops300 Week5 storage (1)trayyoo

Long Term Road Test of C*DataStax Academy

Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...ScyllaDB

Tuning Linux for Databases.Alexey Lesovsky

Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...DataStax Academy

How to Meet Your P99 Goal While Overcommitting Another WorkloadScyllaDB

7 Ways To Crash PostgresPostgreSQL Experts, Inc.

PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)Ontico

HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)Ontico

What's hot (19)

Cassandra NYC 2011 Data Modeling

Shootout at the AWS Corral

92 grand prix_2013

Seastore: Next Generation Backing Store for Ceph

Shootout at the PAAS Corral

Speeding up Page Load Times by Using Starling

MySQL Performance - SydPHP October 2011

Ndb cluster 80_ycsb_mem

Unikraft: Fast, Specialized Unikernels the Easy Way

Avoiding Data Hotspots at Scale

ops300 Week5 storage (1)

Long Term Road Test of C*

Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...

Tuning Linux for Databases.

Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...

How to Meet Your P99 Goal While Overcommitting Another Workload

7 Ways To Crash Postgres

PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)

HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)

Viewers also liked

Cassandra concepts, patterns and anti-patternsDave Gardner

Cassandra nice use cases and worst anti patternsDuyhai Doan

DZone Cassandra Data Modeling WebinarMatthew Dennis

Cassandra Data ModelingMatthew Dennis

Cassandra, Modeling and Availability at AMUGMatthew Dennis

durability, durability, durabilityMatthew Dennis

The Future Of Big DataMatthew Dennis

Learning CassandraDave Gardner

Advanced data modeling with apache cassandraPatrick McFadin

Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share

Tuberculosis abdominalHumberto Blas

Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Kinetic Data

Webinar Cassandra Anti-PatternsChristopher Batey

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSEDataStax Academy

Planning to Fail #phpuk13Dave Gardner

Cabs, Cassandra, and Hailo (at Cassandra EU)Dave Gardner

Planning to Fail #phpne13Dave Gardner

Cassandra성욱 전

High performance queues with CassandraMikalai Alimenkou

Managing Cassandra at Scale by Al TobeyDataStax Academy

Viewers also liked (20)

Cassandra concepts, patterns and anti-patterns

Cassandra nice use cases and worst anti patterns

DZone Cassandra Data Modeling Webinar

Cassandra Data Modeling

Cassandra, Modeling and Availability at AMUG

durability, durability, durability

The Future Of Big Data

Learning Cassandra

Advanced data modeling with apache cassandra

Cassandra Data Modeling - Practical Considerations @ Netflix

Tuberculosis abdominal

Fears, misconceptions, and accepted anti patterns of a first time cassandra a...

Webinar Cassandra Anti-Patterns

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE

Planning to Fail #phpuk13

Cabs, Cassandra, and Hailo (at Cassandra EU)

Planning to Fail #phpne13

Cassandra

High performance queues with Cassandra

Managing Cassandra at Scale by Al Tobey

Similar to Cassandra Anti-Patterns

Cassandra from tarball to productionRon Kuris

Solr on Docker - the Good, the Bad and the UglySematext Group, Inc.

Solr on Docker: the Good, the Bad, and the Ugly - Radu Gheorghe, Sematext Gro...Lucidworks

OOPs, OOMs, oh my! Containerizing JVM appsSematext Group, Inc.

Rails Conf Europe 2007 NotesRoss Lawley

Hug Hbase Presentation.Jack Levin

VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld

Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan

Introduction to Galera ClusterCodership Oy - Creators of Galera Cluster

Ceph Performance: Projects Leading up to JewelColleen Corrice

Ceph Performance: Projects Leading Up to JewelRed_Hat_Storage

10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...DevOpsDays Tel Aviv

How swift is your Swift - SD.pptxOpenStack Foundation

Kubernetes at Datadog the very hard wayLaurent Bernaille

Ceph Day Chicago - Ceph at work at Bloomberg Ceph Community

Pgbr 2013 postgres on awsEmanuel Calvo

1. Scaling PHP/MySQL...Presentation from Flickrakshat

Oracle: Binding versus cagingBertrandDrouvot

DrupalCampLA 2011: Drupal backend-performanceAshok Modi

Java and cgroups engRalf Ernst

Similar to Cassandra Anti-Patterns (20)

Cassandra from tarball to production

Solr on Docker - the Good, the Bad and the Ugly

Solr on Docker: the Good, the Bad, and the Ugly - Radu Gheorghe, Sematext Gro...

OOPs, OOMs, oh my! Containerizing JVM apps

Rails Conf Europe 2007 Notes

Hug Hbase Presentation.

VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...

Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra

Introduction to Galera Cluster

Ceph Performance: Projects Leading up to Jewel

Ceph Performance: Projects Leading Up to Jewel

10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...

How swift is your Swift - SD.pptx

Kubernetes at Datadog the very hard way

Ceph Day Chicago - Ceph at work at Bloomberg

Pgbr 2013 postgres on aws

1. Scaling PHP/MySQL...Presentation from Flickr

Oracle: Binding versus caging

DrupalCampLA 2011: Drupal backend-performance

Java and cgroups eng

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Histor y of HAM Radio presentation slidevu2urc

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Partners Life - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Developing An App To Navigate The Roads of BrazilV3cube

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames

Histor y of HAM Radio presentation slide

Data Cloud, More than a CDP by Matt Robison

GenCyber Cyber Security Day Presentation

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Injustice - Developers Among Us (SciFiDevCon 2024)

08448380779 Call Girls In Friends Colony Women Seeking Men

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

Salesforce Community Group Quito, Salesforce 101

Partners Life - Insurer Innovation Award 2024

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Finology Group – Insurtech Innovation Award 2024

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

CNv6 Instructor Chapter 6 Quality of Service

The Codex of Business Writing Software for Real-World Solutions 2.pptx

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Developing An App To Navigate The Roads of Brazil

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Cassandra Anti-Patterns

1. Cassandra Anti-Patterns (in 5m) Matthew F. Dennis // @mdennis

2. Non-Sun (err, Non-Oracle) JVM ● No OpenJDK ● No Blackdown (anyone still use this?) ● Etc, etc, etc; just use the Sun (Oracle) JVM ● At least u22, but in general the latest release (unless you have specific reasons otherwise)

3. CommitLog+Data On The Same Disk ● Don't put the commit log and data directories on the same set of spindles – commit log gets a single spindle entirely to itself (standard consumer SATA disks easily sustain > 80 MB/s in sequential writes) ● DOES NOT APPLY TO SSDS or EC2 ● SSDs have no seek time ● EC2 ephemeral drives are still virtualized (but not the same as EBS) ● On EC2 or SSDs: use one RAID set for both the commit log and data directories

4. EBS volumes on EC2 ● Sounds great, nice feature set, but … ● Not predictable ● “freezes” are common ● Throughput limited in many cases ● Use ephemeral drives instead ● Stripe them ● Both commit log and data directory on the same raid set

5. Oversized JVM heaps ● 6 – 8 GB is good (assuming sufficient ram on your boxen) ● 10 – 12 GB is possible and in some circumstances “correct” ● 16GB == max JVM heap size ● > 16GB => badness ● JVM heap ~= boxen RAM => badness (always)

6. JVM heap size -v- GC suckage ~16GB GC Suckage ~10GB ~6GB JVM heap size

7. Large batch mutations (large in number of distinct rows) ● Timeout / failure => entire mutation must be retried => wasted work ● Larger mutations => higher likely hood of timehood ● 1000 mutations to perform? Do 100 batches of 10 in parallel instead of one batch of 1000 ● Exact number or rows/batch is variable depending on HW, network, load, etc; experiment! (10-100 is a good starting point)

8. OPP / BOP partitioner ● You probably shouldn't use it ● No really, you almost certainly shouldn't use it ● Creates hot spots ● Requires “baby sitting” from ops ● Not as well tested nor is it widely deployed

9. C* auto selection of tokens ● Always specify your initial token. ● Auto select doesn't do what you think it does nor does it do what you want – loadbalance is even worse, it doesn't currently do what you think, what you want or what it claims; “F#@* my cluster” would be a much more apt name than “loadbalance” – Future (next?) release of OPSC will remove your balancing woes

10. Super Columns ● 10 – 15 percent performance penalty on reads and writes ● Easier / better to use to composite columns – 0.8.x makes this a lot easier – Done manually in 0.7.x and is still better ● Devs working in C* code despise (loathe?) them ● API probably won't be deprecated, but implementation will be replaced behind the seen with composites (may be “ok” at that point to use them, but should probably just use composite API direclty) ● Cassandra and DataStax is committed to maintain the API going forward, even if the implementation changes

11. Read Before Write ● Race conditions ● Abuses/Thrashes cache (row, key and page) ● Increases latency ● Increases IO requirements (by a lot) ● Increases size in the client

12. Winblows ● Try to avoid it, you'll be happier – Not always possible? Then, “I'm sorry for your pain” ● Run 'nix (in particular, probably Linux) ● Easier to get help (IRC, email, meetups, etc) ● C* performs better ● Better tested ● Cheaper ● Wider deployed (by a lot)

13. Q? Cassandra Anti-Patterns Matthew F. Dennis // @mdennis

Cassandra Anti-Patterns

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Cassandra Anti-Patterns

Similar to Cassandra Anti-Patterns (20)

Recently uploaded

Recently uploaded (20)

Cassandra Anti-Patterns