Introduction to apache_cassandra_for_developers-lhgPresentation Transcript
Introduction to Apache Cassandra (for Java Developers!)
Nate McCall [email_address] @zznate
Apache Cassandra is NOT a "key/value store” Columns are dynamic inside a column family (but they don't have to be)
Gain an understanding concepts in Apache Cassandra that have particulr effect on application development Gain an understanding of concepts in Apache Cassandra that have particular effect on application development
Brief Intro - Storage
SSTables are immutable SSTables merged on reads
Brief Intro - Compaction
Keep SSTable count down Discard tombstones (more on this later)
*** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK *** Idempotent: an operation can be applied multiple times without changing the result
vs. RDBMS - Append Only
Proper data modeling will minimizes seeks No read before write (Go to Matt's presentation for more!)
How does this impact development?
Substantially. For operations affecting the same data, that data will become consistent eventually as determined by the timestamps. Trade availability for consistency Store whatever you want. It's all just bytes. Think about how you will query the data before you write it.
JDBC Driver implementation means lots of possibilities
Encapsulate API changes
In-tree support on the way for:
Gone. Added too much complexity after Thrift caught up. “ None of the libraries distinguished themselves as being a particularly crappy choice for serialization.” (See CASSANDRA-1765 )
Thrift API Methods
Five general categories
Writing/Updating/Removing (all the same op!)
On to the Code...
https://github.com/zznate/cassandra-tutorial Uses Maven. Really basic. Modify/abuse/alter as needed. Descriptions of what is going on and how to run each example are in the Javadoc comments. Sample data is based on North American Numbering Plan (easy to find thanks to InfoChimps) http://infochimps.com/datasets/area-code-and-exchange-to-location-north-america-npanxx
512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin
meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29 multiple nodes and failure scenarios Data modeling (go see Matt's presentation)
Things to Remember
deletes and timestamp granularity
“ range ghosts” and “tombstones”
using the wrong column comparator, key/default validators and InvalidRequestException
“ Schema-less” -> “Schema Optional”
use column-level TTL to automate deletion
"how do I iterate over all the rows in a column family"?