4. CQL3 examples
CREATE KEYSPACE shire WITH
REPLICATION = {'class': 'NetworkTopologyStrategy', 'eu' : 3, 'us-east' : 2};
SELECT * FROM emp WHERE empID IN (130,104) ORDER BY deptID DESC;
INSERT INTO excelsior.clicks (userid, url, date, name)
VALUES (
3715e600-2eb0-11e2-81c1-0800200c9a66,
‘http://cassandra.apache.org',
‘2013-10-09',
‘Mary')
USING TTL 86400;
UPDATE users
SET email = ‘charlie@wonka.com’
WHERE login = ‘cbucket64'
IF email = ‘cbucket@wonka.com’
CREATE USER bombadil WITH PASSWORD 'goldberry4ever' SUPERUSER;
GRANT ALTER ON KEYSPACE shire TO gandalf;
5. Ops Friendly
•Simple design
•no special role, no single point of failure
•Lots of exposed metrics via JMX
•Nodes and entire datacenters can go down with no
loss of service
•Rapid read protection
•DataStax OpsCenter
•Visual monitoring tool
•REST interface to metric data
•Free version
•Hands-off services
8. Fully Distributed
•Distributed systems introduce complex problems
•What is “down”?
•Individual server is down
•Network link is down
•Long server pause (e.g. GC pause)
•Variable network latency
•What do I do when a server is overloaded?
•How can I stay available/reliable in such
circumstances?
•How can I maintain consistency?
•How do I reconcile differences?
10. Eventual Consistency
•Individual server durability
•Write to commitlog (batch or periodic sync)
•Write to memtable (which gets flushed to disk)
•Achieving consistency level
•ONE, QUORUM, ALL
•LOCAL_ONE, LOCAL_QUORUM
•ANY, EACH_QUORUM (for writes)
•Important to note:
•All replicas always get a copy of the write
12. Continually cleaning
•Hinted handoff
•valid for a window of time
•replays back to node restored to service
•Read repair
•after a read, check that data for agreement (digest)
•read_repair_chance defaults to 0.1
•also dclocal_read_repair_chance
•Anti-entropy service (manual repair)
•Check for agreement for all data for range A-B
•Run manual repair every gc grace seconds
13. Advanced Repair
•Manual repairs have limited resolution
•“There is something different in these 1000 rows”
•Therefore you have to stream all 1000 rows
•Leads to overstreaming, waste
•You can specify start/end keys
•Get row level precision
•More complicated to execute
•DataStax has a repair service to help
15. Netflix Study
•Two datacenters (US-East and US-West)
•Wrote 500,000 records in each datacenter
•50k write operations per second in each DC
•Wrote at consistency level ONE
•All data read back correctly in other DC
•Tried 5 different runs, introduced failures along
the way
See planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/
16. Practical Consistency
•ONE is not suitable for all cases
•Review your requirements, SLA
•Do your own testing to get comfortable
•Flexibility translates into the best performance for
your use case