CAP Theorem C onsistency A vailability P artition Tolerance “ Thou shalt have but 2” - Conjecture made by Eric Brewer in 2000 - Published as formal proof in 2002 - See: http://en.wikipedia.org/wiki/CAP_theorem for more
<ul>Apache Cassandra Concepts </ul>- Explicit choice of partition tolerance and availability. Consistency is tunable. - No read before write - Merge on read - Idempotent - Schema Optional - All nodes share the same role - Still performs well with larger-than-memory data sets
Generally complements another system(s) (Not intended to be one-size-fits-all) *** You should always use the right tool for the right job anyway
How does this differ from an RDBMS? Substantially.
vs. RDBMS - No Joins Unless: - you do them on the client - you do them via Map/Reduce
vs. RDBMS - Schema Optional (Though you can add meta information for validation and type checking) *** Supports secondary indexes too: “ … WHERE state = 'TX' ”
vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries
vs. RDBMS - Prematerialized and Transaction-less - No ACID transactions - Limited support for ad-hoc queries *** You are going to give up both of these anyway when you shard an RDBMS ***
<ul>vs. RDBMS - Facilitates Consolidation </ul>It can be your caching layer * Off-heap cache (provided you install JNA) It can be your analytics infrastructure * true map/reduce * pig driver * hive driver coming soon
vs. RDBMS - Shared-Nothing Architecture Every node plays the same role: no masters, no slaves, no special nodes *** No single point of failure
<ul>vs. RDBMS - Real Linear Scalability </ul>Want 2x performance? Add 2x nodes. *** 'No downtime' included!
<ul>vs. RDBMS - Performance </ul>Reads on par with writes
Five general categories <ul>Retrieving Writing/Updating/Removing (all the same op!) <ul>Increment counters </ul>Meta Information Schema Manipulation CQL Execution </ul>
Using a Client Hector Client: http://hector-client.org - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed *** like any open source project fully dependent on another open source project it has its worts
<ul>Sample Project for Experimenting </ul>https://github.com/zznate/cassandra-tutorial https://github.com/zznate/hector-examples Built using Hector Really basic – designed to be beginner level w/ very few moving parts Modify/abuse/alter as needed *** Descriptions of what is going on and how to run each example are in the Javadoc comments.
<ul>ColumnFamilyTemplate </ul>Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type) ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); *** (no generics for clarity)
<ul>ColumnFamilyTemplate </ul>new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); Key Format Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
<ul>ColumnFamilyTemplate </ul>ColumnFamilyResult<String, String> res = cft.queryColumns("zznate"); String value = res.getString("email"); Date startDate = res.getDate(“startDate”); Key Format Column Name Format
<ul>Deletion – FYI </ul><ul>mutator.addDeletion("202230", "Npanxx", “city”, stringSerializer); </ul><ul>Does not exist? You just inserted a tombstone! </ul><ul>Sending a deletion for a non-existing row: </ul><ul>[default@Tutorial] list Npanxx; <li>Using default limit of 100