Cassandra Essentials● Inspired by BigTable (Google) and Dynamo (Amazon) ● Eventually consistent ● Multi-level map-like ● Column store● Released by Facebook, adopted by Apache● Supported by DataStax ● EC2 AMI ● Commercial product on top: Brisk
Data Model in Brief● Atomic unit of storage: The Column – Possibly stored in a Super Column● Collections of columns: The Row – Or Super Columns● Collections of rows: The Column Family – Or the Super Column Family● Collections of column families: The Keyspace
The Column● Key, value and timestamp: Age 29 1330945017654
The Row● Many (many, many) columns: ● Columns are sorted on key, good for range queries ● Scales wildly – just keep on adding columns ● In practice, a persistent hash map● Rows can be stored sorted, or hashed Age Kjetil 29 1330945017654
The Column Family● Consists of many (many) rows: YOUNG_AND_PROMISING Age Kjetil 29 1330945017654
The Keyspace● Consists of (many) column families: JUST_YOUNG ● Usually a statically known set YOUNG_AND_PROMISING
WTF a Super Column is● Columns holding (a few) other columns:● Serialized as single value. Do NOT scale wildly. Kjetil 1330945017654
Can You Relate?● Concepts mapped to RDB data model levels ● Keyspace => Schema ● Column family => Table ● Row => Row, but without known columns ● Column => Column name and value found in a row● RDB: Rows, column values are dynamic/data, column names are static/structure● NoSQL: Column keys are dynamic/data, too.
The Column Revisited● Columns are dynamic ● Columns are data, not structure● Column keys dont have to be strings ● Columns can be any supported, sortable primitive type, e.g. timestamps (Long) ● Dont say column name, say column key● Columns are sorted● Some RDB unlearning required
Whats in a KeyspaceSchema?● Keyspace settings ● Partitioning: Decides which node(s) will store rows ● Replication factor ● Custom strategies for partitioning, placement etc.● The set of Column Families ● For each Column Family, the type of its keys● Optional meta-data: ● Pre-defined columns
Data Model Notes/(Anti-)Patterns● Super columns are losing favor ● Prefer “synthetic” columns (e.g. columns grouped by prefix) ● Columns in super columns are schema, NOT data! ● Cassandra devs hate them● Partitioning inside of rows is common ● E.g. for x partitions, compute hash value from column name and mod by x, obtaining i. E.g. if “Age” hashes to module 2, write to row name Kjetil ● Helps to distribute r/w traffic among nodes, for column families with busy/crowded rows
What We Do● Count displays of, and clicks on, ads● Use Cassandra to track # of hits, in time intervals: ● Ads ● Groups of ads ● Advertiser campaigns ● Display boxes ● Publisher channels ● Publisher sites ● Other ● ... and combinations thereof
Example List of Updates● Count +1 for: ● 6 ads, 6 ad groups, 6 campaigns. (No overlap.) ● 2 display boxes, 1 channel (in this case, same channel), 1 site ● 2 channel/ad combinations ● Various secret sauce, e.g. another 4● 28 updates● If click: 11 updates, count +1 for: ● 1 ad, ad group, campaign, box, channel, etc.
But wait, theres more!● Spec says “ in time intervals” => +1 for each of: ● The current hour ● Today ● This week ● This month ● This year ● Total● Total: 6x28 = 168 updates● For average of 500 requests/sec, ~100 updates/req: ● ~50,000 writes/second
Cassandra 1.0 Applied● New feature/godsend: Counter columns! ● Like Long values, but ● Accept updates that are increments to current value● Combined with batched updates ● Phew!● Scale out for write traffic and workable read speed ● Done!
Real data: Row and columns● D ● D: Daily interval, partition 0 (hashed from key)● 20120121 ● The day: January 21 this year● channel_ad/Channel:b29-Ad:e13083 ● 1 click, 7 hits for ad 13083 in channel 29 on that day
Stupid Pet Tricks for Sorting● Funny-looking values in the column key? ● a1 ● b29 ● c432 ● d2345 ● e34345● Sortable, more compact and scalable than: ● 00000000029 ● 00000000432 ● ...
Given hit in channel 29 ...● Read from an application-configured set of rows● Example config: last 4 hours, 3 days, 2 weeks. ● 9 logical rows to read from ● Assume 3 partitions for each logical row. ● Read from 27 physical rows, all (or a minimum count of) columns beginning with: – channel_ad/Channel:b29-Ad:● Obtain synthetic clicks/hits ratio for each ad● And channel_ad is just one of the ratios to use
Caching of Synthetic Ratios● Use ehcache ● In-memory, fast ● In-memory, clutters heap, provokes stop-the-world GC● Cache in Cassandra ● Store synthetic reads back in Cassandra (on-demand “denormalization”) ● Still sensitive to high Cassandra loads● Instance-local Redis instance each box ● Stand-alone: Isolated from high Cassandra loads ● Off-heap: Reduce stop-the-world GC ● Fast: Configured for in-memory caching behavior ● Typical time to retrieve a Java object from 200µs to 2ms ● Good trade-off
Client Libraries● Out-of-the-box: Thrift ● Usable, but should not be mixed up with business logic● Java recommendation: Hector ● https://github.com/rantav/hector ● Connection pooling ● Just-above-Thrift-level ● Type-safe(r) r/w
Operations: Quickstart on EC2● DataStax AMI: ● http://datastax.com/docs/1.0/install/install_ami ● Readymade cluster of N nodes ● Free OpsCenter
Operations: Scaling● Scaling Strategy: ● Doubling/halving capacity is very convenient ● => New nodes automatically redistribute load naturally
Operations: Backup● System-wide backups ● Nodes can be asked to dump Snapshots ● Recovery: New nodes started from Snapshots● Selective backups ● Selected data can be dumped to/read from JSON ● sstable2json/json2sstable● Incremental backups
Introducing Cassandra● Look for data that ● Grows fast ● Holds useful information, given time to analyze it ● Can be reproduced from source data (e.g. log files)● Avoid business-critical data ● Let RDBMS handle all that
Living with Cassandra● Columns are data that live in a context: ● Sorted in pre-defined ways, determining query efficiency ● Queried for by application in other ways● Columns are data coupled to your logic ● Typical: Encoding and parsing column names ● Queries will change in development/maintenance – Persisted formats should change – Code must change
Cost of Change● Your NoSQL data are, relative to your RDB data: ● Bigger ● More loosely-defined ● More closely-coupled to application code ● Harder to query (and easier queries => bigger data) ● Less supported by mature tools● Affects cost of change● Rebuild-from-source-data is a better option than migrate-existing-data - if its practical