Cassandra and Linux
An Introduction
Nick Bailey
@nickmbailey
nick@datastax.com
Saturday, June 1, 13
©2012 DataStax
Background
2
Saturday, June 1, 13
©2012 DataStax
Analytics
+
Real Time
3
Big Data
Saturday, June 1, 13
©2012 DataStax
Dynamo
+
BigTable
4
Saturday, June 1, 13
©2012 DataStax
Who is using it?
5
Saturday, June 1, 13
©2012 DataStax 6
Saturday, June 1, 13
©2012 DataStax
Why do people like Cassandra?
7
Saturday, June 1, 13
©2012 DataStax
Availability
8
Saturday, June 1, 13
©2012 DataStax 9
http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html
Saturday, June 1, 13
©2012 DataStax
Scalability
10
Saturday, June 1, 13
©2012 DataStax 11
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Saturday, June 1, 13
©2012 DataStax
Performance
12
Saturday, June 1, 13
©2012 DataStax 13
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf
Saturday, June 1, 13
©2012 DataStax
Multi Datacenter Support
14
Saturday, June 1, 13
©2012 DataStax
Hadoop Support
15
Saturday, June 1, 13
©2012 DataStax
Hadoop Support
• Data Locality
• Workload Partitioning
16
Saturday, June 1, 13
©2012 DataStax
Architecture - Cluster
17
Saturday, June 1, 13
©2012 DataStax 18
Saturday, June 1, 13
©2012 DataStax 19
Saturday, June 1, 13
©2012 DataStax
Architecture - Node
20
Saturday, June 1, 13
©2012 DataStax
Writes
21
Saturday, June 1, 13
©2012 DataStax
Writes
22
Saturday, June 1, 13
©2012 DataStax
Reads
23
Saturday, June 1, 13
©2012 DataStax
Reads
24
Saturday, June 1, 13
©2012 DataStax
Compaction
25
Saturday, June 1, 13
©2012 DataStax
Compaction
• Periodically merge sstables
• Multiple strategies
• SizeTieredCompaction
• LeveledCompaction
26
Saturday, June 1, 13
©2012 DataStax
Hardware
27
Saturday, June 1, 13
©2012 DataStax
Remember:
Cassandra scales horizontally
28
Saturday, June 1, 13
©2012 DataStax
Memory
29
Saturday, June 1, 13
©2012 DataStax
Memory
• More is better
• Sweet spot: 16-64GB
• Don’t give it all to the JVM
• Generally no more than 8GB
• Rest for page cache
• Can run with less for quick testing
30
Saturday, June 1, 13
©2012 DataStax
CPU
31
Saturday, June 1, 13
©2012 DataStax
CPU
• Cassandra is almost always IO bound
• Sweet spot: 8 cores
• Additional CPU required for:
• compression
• leveled compaction
32
Saturday, June 1, 13
©2012 DataStax
Disks
33
Saturday, June 1, 13
©2012 DataStax
Disks
• SSDs are awesome, not required
• Without SSDs:
• At least 2 disks (commitlog, data) (more on that later)
• Faster is better
• Before Cassandra 1.2: ~500GB per node
34
Saturday, June 1, 13
©2012 DataStax
A Note on SSDs
• Write Amplification
• http://en.wikipedia.org/wiki/Write_amplification
• Consumer grade SSDs are fine
• See talk by Rick Branson for more
• http://www.youtube.com/watch?v=zQdDi9pdf3I
• http://www.slideshare.net/rbranson/cassandra-and-
solid-state-drives
35
Saturday, June 1, 13
©2012 DataStax
Homogenous Nodes
• Usually, keep nodes the same
• Vnodes
• Make heterogenous clusters easier
• Added in version 1.2
36
Saturday, June 1, 13
©2012 DataStax
Configuration
37
Saturday, June 1, 13
©2012 DataStax
Disks
38
Saturday, June 1, 13
©2012 DataStax 39
Saturday, June 1, 13
©2012 DataStax
Commit Log
• Keep separate from data drives
• Caveats
• SSDs
• Virtualized Environments
40
Saturday, June 1, 13
©2012 DataStax
Data Drives
• Before Cassandra 1.2
• RAID0/RAID10
• Cassandra 1.2
• JBOD
• Configuration options: stop/best_effort
• XFS
41
Saturday, June 1, 13
©2012 DataStax
Note on SAN/NAS
• Don’t use them
• Cassandra is already distributed
• SPOF
• Cassandra is already IO bound
42
Saturday, June 1, 13
©2012 DataStax
Firewall
43
Saturday, June 1, 13
©2012 DataStax
Firewall
• Ports:
• 7000 - cluster communication
• 9160 - client communication
• JMX:
• Unfortunately, the JMX protocol sucks
• Ports 7199 and 1024+ for remote access
• Solution: only access JMX locally
44
Saturday, June 1, 13
©2012 DataStax
Virtualized Environments (EC2)
45
Saturday, June 1, 13
©2012 DataStax
EC2
• Large/XLarge instances
• Don’t use EBS
• phi_convict_threshold
• Don’t fix nodes, Replace them
• DataStax provides an AMI
46
Saturday, June 1, 13
©2012 DataStax
Miscellaneous
47
Saturday, June 1, 13
©2012 DataStax
Swap
• Disable it
• sudo swapoff --all
• JVM swaps to disk, Cassandra explodes
48
Saturday, June 1, 13
©2012 DataStax
Limits
• /etc/security/limits.conf
• nofile
• memlock
• as
49
Saturday, June 1, 13
©2012 DataStax
NTP
• Install it on
• Cassandra Servers
• Clients
50
Saturday, June 1, 13
©2012 DataStax
Monitor your cluster!
• Cassandra exposes tons of metrics
• Via JMX
• Recently, more options available
• DataStax OpsCenter
• http://www.datastax.com/what-we-offer/products-
services/datastax-opscenter
• Or integrate with your own system
51
Saturday, June 1, 13
©2012 DataStax
Don’t use Windows
• I’m not presenting at Texas Windows Fest
• Technically supported
• Not widely deployed
• Reduced performance
52
Saturday, June 1, 13
©2012 DataStax
Resources
• http://www.datastax.com/docs
• #cassandra on freenode
• http://www.planetcassandra.org
• Mailing Lists
• http://cassandra.apache.org to subscribe
53
Or...
Saturday, June 1, 13
Come to the Summit!
Ask me for a discount code (nick@datastax.com)
June 11-12, 2013
San Francisco, CA
http://www.datastax.com/company/news-and-events/events/
cassandrasummit2013
Saturday, June 1, 13
Want a job?
http://www.datastax.com/company/careers
Saturday, June 1, 13
Questions?
Saturday, June 1, 13

An Introduction to Cassandra on Linux