How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
RiakHow does Riak compare to Cassandra?
/usr/bin/whoami• Russell Smith• Work for UKD1, a consultancy for web-related-tech• Help with application design, infrastructure, capacity planning, etc• Mainly for the video-games industry & web-startups• Twitter: @ukd1
What is Riak?• Pronounced ‘ree-ack’• A scalable, high-availability, distributed, key-value store• Modelled on Amazon’s description of Dynamo, like Cassandra• Commercially supported / developed by Basho• Written in Erlang• Open source - Apache License (2.0)
What isn’t Riak?• Schema enforced - store what you want• Relational database - No joins or constraint enforcement as there are no global locks• Not intended to compete with in-memory column based databases
What versions are available?• Riak• Riak Search (Riak + distributed full-text indexing / search)• Riak Enterprise - commercially licensed - supports extra features for enterprise use (SNMP, data-centre awareness, etc)• Luwak (Riak + app for storing large ﬁles; it’s bundled by default)
Riak’s take on CAP• Exposed to the end user - allowing tuning of N, R & W• N - # of nodes, set per bucket (default of 3)• R - # of nodes required for a read (per request)• W - # of nodes required for a successful write (a number, all, quorum or default for the bucket)
What can you store?• Values against keys• Keys are organised in to buckets• Practical value limit of 64mb• For large ﬁles; Luwak (built in > 0.13) splits them in to smaller blocks
Querying• Two main interfaces; HTTP & Protocol buffers• HTTP API is mainly REST - GET, PUT, DELETE• Riak stores the key, value & metadata about the key;• Content Type, Charset, Encoding & link data• Also: any custom metadata
Links• Used to store one-way relationships between objects;• Stored in object meta-data• Link-walking uses MapReduce
Vector clocks• Each value is tagged with a vector clock• Riak can determine if values;• Are direct decendants of a single object• Share a common parent• Unrelated• In Riak each object has a vector clock• Cassandra uses timestamps - problems can occur with out of sync
Siblings• Siblings are different versions of the same document which Riak has not merged• Occurs only if allow_mult is enabled on a bucket AND;• Concurrent write with the same vector clock value• Stale vector clock• No vector clock passed
Admin• Super simple;• riak-admin join <node-in-cluster>• riak-admin leave• Backup tools are provided....
Backup / restore• riak-admin backup|restore <node> <cookie> <output_ﬁle> [[node| all]]• Alternative is ﬁlesystem backup for bitcask; as it uses append-only ﬁles• riak-admin backup is storage-engine agnostic• riak-admin only backs up kv data; not search indexes (Riak-Search)
Storage engines• Ships with two default storage engines;• Bitcask - default, best when keyspace < RAM• InnoDB - suggested when keyspace > RAM• Also available - Google’s LevelDB. It’s BSD licensed & recently integrated, good for large sets.
Riak-Search• Full-text search engine built on top of Riak• Realtime• Uses Lucene Analyzers, custom ones may be written in Erlang / Java• Supports term / ﬁeld searchs, boolean operators, grouping, lexical range queries and end of word wildcards• Will be part of Riak as default from 1.0
Riak > Cassandra• Extremely simple to add or remove nodes from a cluster• No pre-setup of datamodel• Rest & Protobuf API access• Commercial support from the original developers, Basho
Riak = Cassandra• No single point of failure• Linearly scalable• High availability• Eventually consistent• You can choose your own consistency requirements
Riak < Cassandra• CQL; an SQL-ish language• Range / cover queries are built in (no need to write MapReduce functions)• ‘Enterprise’ features (dc / rack awareness) are free & in the open-source build• Wide support / training from 3rd party commercial parties; DataStax / Acunu / Impetus / Onzra http://wiki.apache.org/cassandra/ThirdPartySupport• Cassandra is seemly more popular & has a bigger community• Partitions vs MD5 of RandomPartitioner; you can’t reconﬁgure if you need - careful you plan with Riak! http://wiki.basho.com/Cluster-Capacity-Planning.html