2. What We’ll Cover
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
CONFIDENTIAL | 2
3. Getting Started
Requirements
JDK 1.6 or greater
Apache Maven 3.0.2 or greater
Apache Cassandra 1.0.7
– DataStax community edition:
http://www.datastax.com/download/community/versions
IDE such as Eclipse or IntelliJ will be helpful but not necessary
Several thumb drives available (please share)
All source on GitHub: https://github.com/zznate/strata-west-2012
CONFIDENTIAL | 3
4. How We’ll Cover It
Learning by doing
Looking at and writing code
Examples are constructed explicitly to show off certain concepts
Move ahead if it gets slow – just start hacking
You must be comfortable writing and debugging software
CONFIDENTIAL | 4
5. Getting Down To It
It does not have to be hard.
CONFIDENTIAL | 5
9. Getting Down To It
You can leverage a mature language with stable clients
against a proven, best of breed solution in use at high-
traffic production environments right now
CONFIDENTIAL | 9
10. What We’ll Cover
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
CONFIDENTIAL | 10
11. Scale Out. But Really Though.
Best of Breed
Linear scaling
Real multi-datacenter support
“Fix it on Monday” fault tolerance
CONFIDENTIAL | 11
12. Static Column Family
GOOG Price:589.55 Name=Google
APPL Price=401.76 Name=Apple
NFLX Price=78.73 Nam=Netflix
NOK Price=6.90 Name=Nokia Exchange=NYSE
Schema Optional Not all columns are required
CONFIDENTIAL | 12
13. Dynamic Column Family
GOOG 10/25/11=583.16 10/24/11=596.42 10/23/11=590.49
APPL 10/25/11=397.77 10/24/11=405.77 10/23/11=392.87
NFLX 10/25/11=77.37 10/24/11=118.14 10/23/11=117.23
NOK 10/25/11=6.71 10/24/11=6.76 10/23/11=6.61
Prematerialized Queries Store it how you read it
CONFIDENTIAL | 13
14. The API
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
CONFIDENTIAL | 14
15. Common API Usage
Starting up
If you didn’t look before hand:
http://www.datastax.com/docs/1.0/getting_started/index
We want to run the Cassandra process in the foreground to see what’s
going on:
cd $CASSANDRA_HOME
/bin/cassandra -f
CONFIDENTIAL | 15
16. Common API Usage
DataStax OpsCenter
If you are not sure why you should have monitoring, have this running at
all times.
http://www.datastax.com/docs/opscenter/index
CONFIDENTIAL | 16
17. Common API Usage
Static Column Families
See org.apache.tutorial.BasicUsageExample
CONFIDENTIAL | 17
18. Common API Usage
Dynamic Column Families
See org.apache.tutorial.TimeseriesInserter
– A Cassandra row can hold up to 2 billion columns
CONFIDENTIAL | 18
19. Common API Usage
Dynamic Column Families
See org.apache.tutorial.TimeseriesIterationQuery
– Encapsulate paging in iteration for easier traversal of wide rows
CONFIDENTIAL | 19
20. Common API Usage
Using CQL
See comments in class files as we go
– Use cqlsh for queries, some administration tasks
– Caveat: no composites or super column support
CONFIDENTIAL | 20
21. Common API Usage
JdbcTemplate
Some compiling required
– Not quite there on the typing support
– Pooling library needs work
– Give this a try if you want: https://github.com/riptano/jdbc-conn-pool
Specifically:
– https://github.com/riptano/jdbc-conn-pool/tree/master/portfolio-example
CONFIDENTIAL | 21
26. Storage and On-Disk Structure
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
CONFIDENTIAL | 26
27. Merge-On-Read
Benefits
On-disk structure is immutable
No read-before-write
Highest timestamp wins
Delete markers (“tombstones”)
thrown out on merge
CONFIDENTIAL | 27
28. Compaction
Benefits
Merge SSTables
Keeps SSTable count down
Makes merge-on-read process
more efficient
Groups rows into single SSTable
Can be vary on workload
Size-Tiered compaction
Leveled compaction
CONFIDENTIAL | 28
29. Common API Usage
Indexing Techniques
See org.apache.tutorial.CompositeDataLoader
– Store a static index in a single row
CONFIDENTIAL | 29
30. Common API Usage
Indexing Techniques
See org.apache.tutorial.CompositeQuery
– Use slice of composites to narrow in on query
CONFIDENTIAL | 30
31. Common API Usage
Indexing Techniques
See org.apache.tutorial.CompositeQuery
– Let’s add another level to the composite
CONFIDENTIAL | 31
32. Common API Usage
Indexing Techniques
See org.apache.tutorial.CompositeQuery
– Add a third level to composite to narrow search to “cities in California
starting with “Ag”
CONFIDENTIAL | 32
33. Common API Usage
Revisiting the Time Series Example
See org.apache.tutorial.BucketingTimeSeriesInserter
– Uses buckets for granularity
Every minute gets a distinct row 2012_02_28_13_30
CONFIDENTIAL | 33
34. Storage Model
Revisiting the Time Series Example
See org.apache.tutorial.BucketingTimeSeriesQuery
– More advanced slicing examples
– Keys can be rebuilt for any time window
– Keep rows grouped tightly on disk
I need the 30 minutes between 3 and 4pm for every day last week
CONFIDENTIAL | 34
35. Storage Model
Tombstones
See org.apache.tutorial.TombstoneDemoInserter and
TombstoneDemoQuery
CONFIDENTIAL | 35
38. Understanding the Ring and Consistency
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
CONFIDENTIAL | 38
39. The Ring
Token Distribution Distributed Hashing
Lexigraphically similar tokens are
hashed to (very) different values
key 'fon' 'foo'
Provides for shared knowledge of
key location
token 0 100 The actual token range is from 0
to 2^128
The token is created by converting
an MD5 hash of the key to a
java.lang.BigInteger
CONFIDENTIAL | 39
40. The Ring
Token Distribution as a Ring Wrapping Ranges
The next token after the highest
100 1 possible value is the lowest possible
value.
'foo' 'fon'
CONFIDENTIAL | 40
41. The Ring
4 Node Token Distribution Simplified Ring Example
Nodes distribute ownership via
Node 1 Token ranges
token: 0
A node owns it’s token and the
“foo”
range immediately before
Nodes continuously “gossip” ring
ownership
Node 4 Node 2
token: 75 token: 25 Any node can act as a coordinator
to service requests for any other
node
Node 3
token: 50
CONFIDENTIAL | 41
42. The Ring
Initial Token First Token Last Token
Node 1 0 76 0
Node 2 25 1 25
Node 3 50 26 50
Node 4 75 51 75
Inclusive token ranges for a four node cluster
CONFIDENTIAL | 42
43. Integrating with Web Applicaitons
Cassandra Basics
Common API Usage
Storage Model
Ring Overview
Web Application Integration
CONFIDENTIAL | 43
45. Web Application Integration
Probably as far as we’ll get…
DataStax Documentation: http://www.datastax.com/docs/1.0/index
Apache Cassandra project wiki: http://wiki.apache.org/cassandra/
“The Dynamo Paper”: http://www.allthingsdistributed.com/files/amazon-
dynamo-sosp2007.pdf
P. Helland. Building on Quicksand: http://arxiv.org/pdf/0909.1788
P. Helland. Life Beyond Distributed Transactions:
http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf
“The Megastore Paper”:
http://research.google.com/pubs/archive/36971.pdf
The Hector Client: http://hector-client.org
CONFIDENTIAL | 45