SlideShare a Scribd company logo
Top 5 Tips & Tricks with Cassandra / DSE
Cliff Gilmore
Solution Engineer @ DataStax
1 Data Modeling
2 Compaction
3 POC Mistakes
4 Hardware Selection
5 Two Common Anti-Patterns
Avoid Secondary Indexes For
Most Cases
Assume a user table and the following queries
create table users (
userid uuid,
first_name text,
last_name text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
1)  Get user details for a given userid
2)  Get all the users with the first name of John
3)  Get user details for user given an email
4)  Get all the users created on June 3rd 2014
Wrong Way
create index users_first on users (first_name);
create index users_last on users (last_name);
create index users_date on users (created_date);
Why Not?
•  High Cardinality
•  Many Nodes Required to Deliver Result
•  Tombstones
•  Heavy Resource Usage
•  Slow Performance!!!
Avoid Secondary Indexes For Most Queries
Better Way
All the users with the first name of John
create table users_by_fname (
userid uuid,
first_name text,
last_name text,
email text,
created_time timestamp,
PRIMARY KEY (first_name,userid)
);
User details for user with the email
‘john.doe@datastax.com’
create table users_by_email (
userid uuid,
first_name text,
last_name text,
email text,
created_time timestamp,
PRIMARY KEY (email)
);
All the users for a create day
create table users_by_day (
userid uuid,
first_name text,
last_name text,
email text,
created_time timestamp,
created_day text, //yyyynndd date
PRIMARY KEY (
created_day, created_time, userid)
);
Also can sort by create time within the day in
query!
1 Data Modeling
2 Compaction
3 POC Mistakes
4 Hardware Selection
5 Two Common Anti-Patterns
Compaction Choices
Why Does It Matter?
•  Write performance
•  Read Performance
•  Sizing impact – different free space requirements
What Are my Options?
•  Size Tiered
•  Leveled
•  Date Tiered
Compaction Choices
Size-Tiered
When to Use?
•  Slow Storage (spinning disk)
•  Insert Heavy Workload
•  Few Updates
Negatives
•  Requires Lots of Free Space
•  Can read many sstables to satisfy a query
Compaction Choices
Leveled
When to Use?
•  Read Latency Sensitive Queries
•  SSD hardware
•  Less Free Space
Negatives
•  Uses significantly more IO to compact
•  No performance gain on partitions written
to once and never updated
Compaction Choices
Date-Tiered
When to Use?
•  Time Series Tables
–  Higher node density
•  Few if any updates
•  Predictable deletes (Default TTL)
Negatives
•  Not good if frequent updates/deletes
•  Only appropriate for time ordered data
1 Data Modeling
2 Compaction
3 POC Mistakes
4 Hardware Selection
5 Two Common Anti-Patterns
Common Proof of Concept Mistakes
•  Performance testing on different hardware
–  VMs on SAN will not reflect production performance
•  Not using queries that represent the final product
–  Use Cassandra 2.1’s cassandra-stress to test against actual tables
–  Understand the queries against the system
•  Using empty nodes for performance testing
–  OS Buffer cache masks IO system
1 Data Modeling
2 Compaction
3 POC Mistakes
4 Hardware Selection
5 Two Common Anti-Patterns
Hardware
•  More Moderate Sized Nodes > Few Larger Nodes
–  Cassandra is JVM based, Heap limitations
–  Scale out not up
–  Scaling really is linear à
•  CPU
–  4-16 Cores
•  RAM
–  32+GB
–  OS Buffer Cache via mmap
Hardware
•  Local Storage
–  SAN -> SPOF, Latency Spikes, Throughput Issues
•  Why SSD?
–  Read Latency
–  Compaction
–  Repair
–  Performance
12ms 7200RPM
7ms 10k
5ms 15k
.04 ms SSD
1 Data Modeling
2 Compaction
3 POC Mistakes
4 Hardware Selection
5 Two Common Anti-Patterns
Loading Data via Batch
•  Cassandra Provides Logged and Unlogged Batch Statements
–  Logged protects against partial completion
•  Often used to keep multiple tables with same data points in sync
–  Unlogged is just a grouping of statements
Don’t do this!
BEGIN UNLOGGED BATCH;
insert into users (userid,first_name,…) values(1,”John”,…);
insert into users (userid,first_name,…) values(2,”Jeff”,…);
insert into users (userid,first_name,…) values(3,”Joe”,…);
insert into users (userid,first_name,…) values(4,”Jason”,…);
APPLY BATCH
Loading Data via Batch
•  Why Not?
–  Puts extra work load on the coordinator
–  Adds a network hop by nullifying token awareness
–  JVM Pressure on Coordinator
•  What Should you Do?
–  Asynchronous inserts via Prepared Statements
–  Faster execution wait
–  Better usage of built in load balancing
Cassandra Queues with Improper Data Model
•  What is the anti-pattern?
–  Cassandra has to scan across
tombstones in a partition to read from
that partition
–  gc_grace_period poses a dilema
•  Example
create table queue (
element_name text,
queue_time timeuuid,
payload blob,
PRIMARY KEY (element_name,queue_time)
);
If we queue 1000 payloads for a given
element_name and delete the payloads as they are
dequeued there will be 1000 tombstones in this
partition that need to be scanned across.
•  Workarounds
–  Multiple Tables with some time
element and truncate the tables
when empty
–  Have each worker create it’s own
table for a workload and truncate
the table when the work is
complete
–  Both of these methods have pros
and cons, so be careful and
understand your workload!
Questions?
Thank You
cgilmore@datastax.com

More Related Content

What's hot

2014 CrossRef Workshops: System Update
2014 CrossRef Workshops: System Update2014 CrossRef Workshops: System Update
2014 CrossRef Workshops: System Update
Crossref
 
Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.
MongoDB
 
BloodHound 1.3 - The ACL Attack Path Update - Paranoia17, Oslo
BloodHound 1.3 - The ACL Attack Path Update - Paranoia17, OsloBloodHound 1.3 - The ACL Attack Path Update - Paranoia17, Oslo
BloodHound 1.3 - The ACL Attack Path Update - Paranoia17, Oslo
Andy Robbins
 
CCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloud
walk2talk srl
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
Santanu Dey
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
nathanmarz
 
CSL Seminar presented by Cassiano Campes - 17-03-13
CSL Seminar presented by Cassiano Campes - 17-03-13CSL Seminar presented by Cassiano Campes - 17-03-13
CSL Seminar presented by Cassiano Campes - 17-03-13
Cassiano Campes
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
Ruby Shrestha
 
Automation testing of REST endpoints in a less coded way
Automation testing of REST endpoints in a less coded wayAutomation testing of REST endpoints in a less coded way
Automation testing of REST endpoints in a less coded way
Aleh Struneuski
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
otisg
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012
Datadog
 

What's hot (11)

2014 CrossRef Workshops: System Update
2014 CrossRef Workshops: System Update2014 CrossRef Workshops: System Update
2014 CrossRef Workshops: System Update
 
Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.Back tobasicswebinar part6-rev.
Back tobasicswebinar part6-rev.
 
BloodHound 1.3 - The ACL Attack Path Update - Paranoia17, Oslo
BloodHound 1.3 - The ACL Attack Path Update - Paranoia17, OsloBloodHound 1.3 - The ACL Attack Path Update - Paranoia17, Oslo
BloodHound 1.3 - The ACL Attack Path Update - Paranoia17, Oslo
 
CCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloudCCI2018 - Benchmarking in the cloud
CCI2018 - Benchmarking in the cloud
 
Cloud Performance Benchmarking
Cloud Performance BenchmarkingCloud Performance Benchmarking
Cloud Performance Benchmarking
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
CSL Seminar presented by Cassiano Campes - 17-03-13
CSL Seminar presented by Cassiano Campes - 17-03-13CSL Seminar presented by Cassiano Campes - 17-03-13
CSL Seminar presented by Cassiano Campes - 17-03-13
 
Elasticsearch: An Overview
Elasticsearch: An OverviewElasticsearch: An Overview
Elasticsearch: An Overview
 
Automation testing of REST endpoints in a less coded way
Automation testing of REST endpoints in a less coded wayAutomation testing of REST endpoints in a less coded way
Automation testing of REST endpoints in a less coded way
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012Lifting the Blinds: Monitoring Windows Server 2012
Lifting the Blinds: Monitoring Windows Server 2012
 

Viewers also liked

DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
Matthew Dennis
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
Matthew Dennis
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
Matthew Dennis
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
Matthew Dennis
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
Knoldus Inc.
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
Patrick McFadin
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 

Viewers also liked (8)

DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 

Similar to Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE

MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
Fan Robbin
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
Jon Haddad
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
Daniel Coupal
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
MongoDB
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the Server
ColdFusionConference
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
David Martínez Rego
 
KYSUC - Keep Your Schema Under Control
KYSUC - Keep Your Schema Under ControlKYSUC - Keep Your Schema Under Control
KYSUC - Keep Your Schema Under Control
Coimbra JUG
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
DataStax Academy
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
DataStax Academy
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
MongoDB
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
Bryan Bende
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 

Similar to Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE (20)

MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
 
My Database Skills Killed the Server
My Database Skills Killed the ServerMy Database Skills Killed the Server
My Database Skills Killed the Server
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
KYSUC - Keep Your Schema Under Control
KYSUC - Keep Your Schema Under ControlKYSUC - Keep Your Schema Under Control
KYSUC - Keep Your Schema Under Control
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
 
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Chicago 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

Cassandra Day Chicago 2015: Top 5 Tips/Tricks with Apache Cassandra and DSE

  • 1. Top 5 Tips & Tricks with Cassandra / DSE Cliff Gilmore Solution Engineer @ DataStax
  • 2. 1 Data Modeling 2 Compaction 3 POC Mistakes 4 Hardware Selection 5 Two Common Anti-Patterns
  • 3. Avoid Secondary Indexes For Most Cases Assume a user table and the following queries create table users ( userid uuid, first_name text, last_name text, email text, created_date timestamp, PRIMARY KEY (userid) ); 1)  Get user details for a given userid 2)  Get all the users with the first name of John 3)  Get user details for user given an email 4)  Get all the users created on June 3rd 2014 Wrong Way create index users_first on users (first_name); create index users_last on users (last_name); create index users_date on users (created_date); Why Not? •  High Cardinality •  Many Nodes Required to Deliver Result •  Tombstones •  Heavy Resource Usage •  Slow Performance!!!
  • 4. Avoid Secondary Indexes For Most Queries Better Way All the users with the first name of John create table users_by_fname ( userid uuid, first_name text, last_name text, email text, created_time timestamp, PRIMARY KEY (first_name,userid) ); User details for user with the email ‘john.doe@datastax.com’ create table users_by_email ( userid uuid, first_name text, last_name text, email text, created_time timestamp, PRIMARY KEY (email) ); All the users for a create day create table users_by_day ( userid uuid, first_name text, last_name text, email text, created_time timestamp, created_day text, //yyyynndd date PRIMARY KEY ( created_day, created_time, userid) ); Also can sort by create time within the day in query!
  • 5. 1 Data Modeling 2 Compaction 3 POC Mistakes 4 Hardware Selection 5 Two Common Anti-Patterns
  • 6. Compaction Choices Why Does It Matter? •  Write performance •  Read Performance •  Sizing impact – different free space requirements What Are my Options? •  Size Tiered •  Leveled •  Date Tiered
  • 7. Compaction Choices Size-Tiered When to Use? •  Slow Storage (spinning disk) •  Insert Heavy Workload •  Few Updates Negatives •  Requires Lots of Free Space •  Can read many sstables to satisfy a query
  • 8. Compaction Choices Leveled When to Use? •  Read Latency Sensitive Queries •  SSD hardware •  Less Free Space Negatives •  Uses significantly more IO to compact •  No performance gain on partitions written to once and never updated
  • 9. Compaction Choices Date-Tiered When to Use? •  Time Series Tables –  Higher node density •  Few if any updates •  Predictable deletes (Default TTL) Negatives •  Not good if frequent updates/deletes •  Only appropriate for time ordered data
  • 10. 1 Data Modeling 2 Compaction 3 POC Mistakes 4 Hardware Selection 5 Two Common Anti-Patterns
  • 11. Common Proof of Concept Mistakes •  Performance testing on different hardware –  VMs on SAN will not reflect production performance •  Not using queries that represent the final product –  Use Cassandra 2.1’s cassandra-stress to test against actual tables –  Understand the queries against the system •  Using empty nodes for performance testing –  OS Buffer cache masks IO system
  • 12. 1 Data Modeling 2 Compaction 3 POC Mistakes 4 Hardware Selection 5 Two Common Anti-Patterns
  • 13. Hardware •  More Moderate Sized Nodes > Few Larger Nodes –  Cassandra is JVM based, Heap limitations –  Scale out not up –  Scaling really is linear à •  CPU –  4-16 Cores •  RAM –  32+GB –  OS Buffer Cache via mmap
  • 14. Hardware •  Local Storage –  SAN -> SPOF, Latency Spikes, Throughput Issues •  Why SSD? –  Read Latency –  Compaction –  Repair –  Performance 12ms 7200RPM 7ms 10k 5ms 15k .04 ms SSD
  • 15. 1 Data Modeling 2 Compaction 3 POC Mistakes 4 Hardware Selection 5 Two Common Anti-Patterns
  • 16. Loading Data via Batch •  Cassandra Provides Logged and Unlogged Batch Statements –  Logged protects against partial completion •  Often used to keep multiple tables with same data points in sync –  Unlogged is just a grouping of statements Don’t do this! BEGIN UNLOGGED BATCH; insert into users (userid,first_name,…) values(1,”John”,…); insert into users (userid,first_name,…) values(2,”Jeff”,…); insert into users (userid,first_name,…) values(3,”Joe”,…); insert into users (userid,first_name,…) values(4,”Jason”,…); APPLY BATCH
  • 17. Loading Data via Batch •  Why Not? –  Puts extra work load on the coordinator –  Adds a network hop by nullifying token awareness –  JVM Pressure on Coordinator •  What Should you Do? –  Asynchronous inserts via Prepared Statements –  Faster execution wait –  Better usage of built in load balancing
  • 18. Cassandra Queues with Improper Data Model •  What is the anti-pattern? –  Cassandra has to scan across tombstones in a partition to read from that partition –  gc_grace_period poses a dilema •  Example create table queue ( element_name text, queue_time timeuuid, payload blob, PRIMARY KEY (element_name,queue_time) ); If we queue 1000 payloads for a given element_name and delete the payloads as they are dequeued there will be 1000 tombstones in this partition that need to be scanned across. •  Workarounds –  Multiple Tables with some time element and truncate the tables when empty –  Have each worker create it’s own table for a workload and truncate the table when the work is complete –  Both of these methods have pros and cons, so be careful and understand your workload!