2. @chbatey
Who am I?
⢠Technical Evangelist for Apache Cassandra
â˘Founder of Stubbed Cassandra
â˘Help out Apache Cassandra users
⢠DataStax
â˘Builds enterprise ready version of Apache
Cassandra
⢠Previous: Cassandra backed apps at BSkyB
3. @chbatey
Overview
⢠Distributed databases: What and why?
⢠Cassandra use cases
⢠Replication
⢠Fault tolerance
⢠Read and write path
⢠Data modelling
⢠Java Driver
13. @chbatey
Decisions decisions⌠CAP theorem
Are these really that different??
Relational Database
Mongo, Redis
Highly Available Databases:
Voldermort, Cassandra
14. @chbatey
PAC-ELC
⢠If Partition: Trade off Availability vs Consistency
⢠Else: Trade off Latency vs Consistency
⢠http://dbmsmusings.blogspot.co.uk/2010/04/problems-
with-cap-and-yahoos-little.html
18. @chbatey
Common use cases
â˘Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
â˘Non functional requirements:
-Linear scalability
-High throughout durable writes
-Multi datacenter including active-active
-Analytics without ETL
19. @chbatey
Work loads
⢠High write and/or throughout
⢠Many small queries
⢠Distributed across the world (eventually consistent)
22. @chbatey
Datacenter and rack aware
Europe
⢠Distributed master less
database (Dynamo)
⢠Column family data model
(Google BigTable)
⢠Multi data centre replication
built in from the start
USA
23. @chbatey
Cassandra
Online
⢠Distributed master less
database (Dynamo)
⢠Column family data model
(Google BigTable)
⢠Multi data centre replication
built in from the start
⢠Analytics with Apache SparkAnalytics
25. @chbatey
Dynamo 101
⢠The parts Cassandra took
- Consistent hashing
- Replication
- Strategies for replication
- Gossip
- Hinted handoff
- Anti-entropy repair
⢠And the parts it left behind
- Key/Value
- Vector clocks
26. @chbatey
Picking the right nodes
⢠You donât want a full table scan on a 1000 node cluster!
⢠Dynamo to the rescue: Consistent Hashing
⢠Then the replication strategy takes over:
- Network topology
- Simple
27. @chbatey
Murmer3 Example
⢠Data:
⢠Murmer3 Hash Values:
jim age: 36 car: ford gender: M
carol age: 37 car: bmw gender: F
johnny age: 12 gender: M
suzy: age: 10 gender: F
Primary Key Murmur3 hash value
jim 350
carol 998
johnny 50
suzy 600
Primary Key
Real hash range: -9223372036854775808 to 9223372036854775807
30. @chbatey
Murmer3 Example
Data is distributed as:
Node Start range End range Primary
key
Hash value
A 0 249 johnny 50
B 250 499 jim 350
C 500 749 suzy 600
D 750 999 carol 998
32. @chbatey
Replication strategy
⢠Simple
- Give it to the next node in the ring
- Donât use this in production
⢠NetworkTopology
- Every Cassandra node knows its DC and Rack
- Replicas wonât be put on the same rack unless Replication Factor > # of racks
- Unfortunately Cassandra canât create servers and racks on the fly to fix this :(
34. @chbatey
Replication strategy
⢠Simple
- Give it to the next node in the ring
- Donât use this in production
⢠NetworkTopology
- Every Cassandra node knows its DC and Rack
- Replicas wonât be put on the same rack unless Replication Factor > # of racks
- Unfortunately Cassandra canât create servers and racks on the fly to fix this :(
36. @chbatey
Tunable Consistency
â˘Data is replicated N times
â˘Every query that you execute you give a consistency
-ALL
-QUORUM
-LOCAL_QUORUM
-ONE
⢠Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://
youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
42. @chbatey
But what happens when they come back?
⢠Hinted handoff to the rescue
⢠Coordinators keep writes for downed nodes for a
configurable amount of time, default 3 hours
⢠Longer than that run a repair
44. @chbatey
Donât forget to be social
⢠Each node talks to a few of its other and shares
information
Did you hear node
1 was down???
45. @chbatey
Scaling shouldnât be hard
⢠Throw more nodes at a cluster
⢠Bootstrapping + joining the ring
⢠For large data sets this can take some time
49. @chbatey
CQL
â˘Cassandra Query Language
-SQL like query language
â˘Keyspace â analogous to a schema
- The keyspace determines the RF (replication factor)
â˘Table â looks like a SQL Table CREATE TABLE scores (
name text,
score int,
date timestamp,
PRIMARY KEY (name, score)
);
INSERT INTO scores (name, score, date)
VALUES ('bob', 42, '2012-06-24');
INSERT INTO scores (name, score, date)
VALUES ('bob', 47, '2012-06-25');
SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
51. @chbatey
UUID
⢠Universal Unique ID
- 128 bit number represented in character form e.g.
99051fe9-6a9c-46c2-b949-38ef78858dd0
⢠Easily generated on the client
- Version 1 has a timestamp component (TIMEUUID)
- Version 4 has no timestamp component
52. @chbatey
Company Confidential
TIMEUUID
TIMEUUID data type supports Version 1 UUIDs
Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 bits)
â CQL function ânow()â generates a new TIMEUUID
Time can be extracted from TIMEUUID
â CQL function dateOf() extracts the timestamp as a date
TIMEUUID values in clustering columns or in column names are ordered based on time
â DESC order on TIMEUUID lists most recent data first
Š 2014 DataStax, All Rights Reserved.
54. @chbatey
Data Model - User Defined Types
⢠Complex data in one place
⢠No multi-gets (multi-partitions)
⢠Nesting!
CREATE TYPE address (
street text,
city text,
zip_code int,
country text,
cross_streets set<text>
);
55. Data Model - Updated
⢠We can embed video_metadata in videos
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
57. @chbatey
Tuple type
⢠Type to represent a
group
⢠Up to 256 different
elements
CREATE TABLE tuple_table (
id int PRIMARY KEY,
three_tuple frozen <tuple<int, text, float>>,
four_tuple frozen <tuple<int, text, float, inet>>,
five_tuple frozen <tuple<int, text, float, inet, ascii>>
);
58. @chbatey
Counters
⢠Old has been around since .8
⢠Commit log replay changes counters
⢠Repair can change a counter
59. @chbatey
Time-to-Live (TTL)
TTL a row:
INSERT INTO users (id, first, last) VALUES (âabc123â, âcatherineâ, âcachartâ)
USING TTL 3600; // Expires data in one hourâ¨
TTL a column:
UPDATE users USING TTL 30 SET last = âmillerâ WHERE id = âabc123â
â TTL in seconds
â Can also set default TTL at a table level
â Expired columns/values automatically deleted
â With no TTL specified, columns/values never expire
â TTL is useful for automatic deletion
â Re-inserting the same row before it expires will overwrite TTL
68. @chbatey
Languages
⢠DataStax (open source)
- C#, Java, C++, Python, Node, Ruby
- Very similar programming API
⢠Other open source
- Go
- Clojure
- Erlang
- Haskell
- Many more Java/Python drivers
- Perl
74. @chbatey
Mapping API
@Accessorâ¨
public interface CustomerEventDao {â¨
@Query("select * from customers.customer_events where customer_id = :customerId")â¨
Result<CustomerEvent> getCustomerEvents(String customerId);â¨
â¨
@Query("select * from customers.customer_events")â¨
Result<CustomerEvent> getAllCustomerEvents();â¨
â¨
@Query("select * from customers.customer_events where customer_id = :customerId
and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)")â¨
Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime,
long endTime);â¨
}
â¨
@Beanâ¨
public CustomerEventDao customerEventDao() {â¨
MappingManager mappingManager = new MappingManager(session);â¨
return mappingManager.createAccessor(CustomerEventDao.class);â¨
}
75. @chbatey
Adding some type safety
public enum StoreType {â¨
ONLINE, RETAIL, FRANCHISE, MOBILEâ¨
}
@Table(keyspace = "customers", name = "customer_events")â¨
public class CustomerEvent {â¨
@PartitionKeyâ¨
@Column(name = "customer_id")â¨
private String customerId;â¨
â¨
@ClusteringColumn()â¨
private UUID time;â¨
â¨
@Column(name = "staff_id")â¨
private String staffId;â¨
â¨
@Column(name = "store_type")â¨
@Enumerated(EnumType.STRING) // could be EnumType.ORDINALâ¨
private StoreType storeType;â¨
76. @chbatey
User defined types
create TYPE store (name text, type text, postcode text) ;â¨
â¨
â¨
CREATE TABLE customer_events_type(
customer_id text,
staff_id text,
time timeuuid,
store frozen<store>,
event_type text,
tags map<text, text>,
PRIMARY KEY ((customer_id), time));â¨
77. @chbatey
Mapping user defined types
@UDT(keyspace = "customers", name = "store")â¨
public class Store {â¨
private String name;â¨
private StoreType type;â¨
private String postcode;
// getters etc
}
@Table(keyspace = "customers", name = "customer_events_type")â¨
public class CustomerEventType {â¨
@PartitionKeyâ¨
@Column(name = "customer_id")â¨
private String customerId;â¨
â¨
@ClusteringColumn()â¨
private UUID time;â¨
â¨
@Column(name = "staff_id")â¨
private String staffId;â¨
â¨
@Frozenâ¨
private Store store;â¨
â¨
@Column(name = "event_type")â¨
private String eventType;â¨
â¨
private Map<String, String> tags;â¨
78. @chbatey
Mapping user defined types
@UDT(keyspace = "customers", name = "store")â¨
public class Store {â¨
private String name;â¨
private StoreType type;â¨
private String postcode;
// getters etc
}
@Table(keyspace = "customers", name = "customer_events_type")â¨
public class CustomerEventType {â¨
@PartitionKeyâ¨
@Column(name = "customer_id")â¨
private String customerId;â¨
â¨
@ClusteringColumn()â¨
private UUID time;â¨
â¨
@Column(name = "staff_id")â¨
private String staffId;â¨
â¨
@Frozenâ¨
private Store store;â¨
â¨
@Column(name = "event_type")â¨
private String eventType;â¨
â¨
private Map<String, String> tags;â¨
@Query("select * from customers.customer_events_type")â¨
Result<CustomerEventType> getAllCustomerEventsWithStoreType();
81. @chbatey
Light weight transactions
⢠Often referred to as compare and set (CAS)
INSERT INTO USERS (login, email, name, login_count)
values ('jbellis', 'jbellis@datastax.com', 'Jonathan Ellis', 1)
IF NOT EXISTS
82. @chbatey
Batch Statements
BEGIN BATCH
INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user')
UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2'
INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c')
DELETE name FROM users WHERE userID = 'user2â
APPLY BATCH;â¨
BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical
operation
â¨
Atomic operation
If any statement in the batch succeeds, all will
No batch isolation
Other âtransactionsâ can read and write data being affected by a partially executed batch
83. @chbatey
Company Confidential
Batch Statements with LWT
BEGIN BATCH
UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1;
UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4;
APPLY BATCH;
Allows you to group multiple conditional updates in a batch as long as all
those updates apply to the same partitionâ¨
Š 2014 DataStax, All Rights Reserved.
84. @chbatey
Summary
⢠Cassandra is a shared nothing masterless datastore
⢠Availability a.k.a up time is king
⢠Biggest hurdle is learning to model differently
⢠Modern drivers make it easy to work with
85. @chbatey
Thanks for listening
⢠Follow me on twitter @chbatey
⢠Cassandra + Fault tolerance posts a plenty:
⢠http://christopher-batey.blogspot.co.uk/
⢠Cassandra resources: http://planetcassandra.org/