@chbatey
Christopher Batey
Cassandra + Spark
Start your downloads!
Linux/Mac:
curl -L http://downloads.datastax.com/community/dsc-cassandra-2.1.6-bin.tar.gz | tar xz
or google: Cassandra Cluster Manager (CCM)
Windows:
http://downloads.datastax.com/community/
@chbatey
Who am I?
• Built a a lot of systems with Apache
Cassandra at Sky
• Work on a testing library for Cassandra
• Help out Cassandra users
• Twitter: @chbatey
@chbatey
Overview
• Cassandra re-cap
• Replication
• Fault tolerance
• Data modelling
• Cassandra 2.2/3.0 (not released yet)
• Spark 101
• Spark Cassandra: how it is implemented
• (Maybe) A use case: KillrWeather
@chbatey
Why Cassandra?
@chbatey
Cassandra for Applications
APACHE
CASSANDRA
@chbatey
Common use cases
•Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
@chbatey
Common use cases
•Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
•Non functional requirements:
-Linear scalability
-High throughout durable writes
-Multi datacenter including active-active
-Analytics without ETL
@chbatey
Cassandra deep dive
@chbatey
Cassandra
Cassandra
• Distributed masterless
database (Dynamo)
• Column family data model
(Google BigTable)
@chbatey
Datacenter and rack aware
Europe
• Distributed master less
database (Dynamo)
• Column family data model
(Google BigTable)
• Multi data centre replication
built in from the start
USA
@chbatey
Cassandra
Online
• Distributed master less
database (Dynamo)
• Column family data model
(Google BigTable)
• Multi data centre replication
built in from the start
• Analytics with Apache SparkAnalytics
@chbatey
Dynamo 101
@chbatey
Dynamo 101
• The parts Cassandra took
- Consistent hashing
- Replication
- Gossip
- Hinted handoff
- Anti-entropy repair
• And the parts it left behind
- Key/Value
- Vector clocks
@chbatey
Picking the right nodes
• You don’t want a full table scan on a 1000 node cluster!
• Dynamo to the rescue: Consistent Hashing
@chbatey
Example
• Data:
• Murmer3 Hash Values:
jim age: 36 car: ford gender: M
carol age: 37 car: bmw gender: F
johnny age: 12 gender: M
suzy: age: 10 gender: F
Primary Key Murmur3 hash value
jim 350
carol 998
johnny 50
suzy 600
Primary Key
@chbatey
Example
Four node cluster:
Node Murmur3 start range Murmur3 end range
A 0 249
B 250 499
C 500 749
D 750 999
@chbatey
Pictures are better
A
B
C
D
999
249
499
750
749
0
250
500
B
CD
A
@chbatey
Example
Data is distributed as:
Node Start range End range Primary
key
Hash value
A 0 249 johnny 50
B 250 499 jim 350
C 500 749 suzy 600
D 750 999 carol 998
@chbatey
Replication
@chbatey
Replication strategy
• NetworkTopology
- Every Cassandra node knows its DC and Rack
- Replicas won’t be put on the same rack unless Replication Factor > # of racks
- Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
@chbatey
Replication
DC1 DC2
client
RF3 RF3
C
RC
WRITE
CL = 1 We have replication!
@chbatey
Tunable Consistency
•Data is replicated N times
•Every query that you execute you give a consistency
-ALL
-QUORUM
-LOCAL_QUORUM
-ONE
• Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://
youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
@chbatey
Light weight transactions
• IF NOT EXISTS
• Compare and set
@chbatey
Load balancing
•Data centre aware policy
•Token aware policy
•Latency aware policy
•Whitelist policy
APP APP
Async
Replication
DC1 DC2
@chbatey
But what happens when they come back?
• Hinted handoff to the rescue
• Coordinators keep writes for downed nodes for a
configurable amount of time, default 3 hours
• Longer than that run a repair
@chbatey
Anti entropy repair
• Not exciting but mandatory :)
• New in 2.1 - incremental repair <— awesome
@chbatey
Scaling shouldn’t be hard
• Throw more nodes at a cluster
• Bootstrapping + joining the ring
• For large data sets this can take some time
@chbatey
Data modelling
@chbatey
You must denormalise
@chbatey
Cassandra can not join
Client
Join what?
@chbatey
CQL
•Cassandra Query Language
-SQL like query language
•Keyspace – analogous to a schema
- The keyspace determines the RF (replication factor)
•Table – looks like a SQL Table CREATE TABLE scores (
name text,
score int,
date timestamp,
PRIMARY KEY (name, score)
);
INSERT INTO scores (name, score, date)
VALUES ('bob', 42, '2012-06-24');
INSERT INTO scores (name, score, date)
VALUES ('bob', 47, '2012-06-25');
SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
@chbatey
Lots of types
@chbatey
UUID
• Universal Unique ID
- 128 bit number represented in character form e.g.
99051fe9-6a9c-46c2-b949-38ef78858dd0
• Easily generated on the client
- Version 1 has a timestamp component (TIMEUUID)
- Version 4 has no timestamp component
@chbatey
Collections
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
@chbatey
Data Model - User Defined Types
• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting!
CREATE TYPE address (
street text,
city text,
zip_code int,
country text,
cross_streets set<text>
);
@chbatey
Time-to-Live (TTL)
TTL a row:
INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’)
USING TTL 3600; // Expires data in one hour

TTL a column:
UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’
– TTL in seconds
– Can also set default TTL at a table level
– Expired columns/values automatically deleted
– With no TTL specified, columns/values never expire
– TTL is useful for automatic deletion
– Re-inserting the same row before it expires will overwrite TTL
@chbatey
But isn’t Cassandra a columnar store?
Storing weather data
CREATE TABLE raw_weather_data (
weather_station text,
year int,
month int,
day int,
hour int,
temp double,
PRIMARY KEY ((weather_station), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour
DESC);
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key Clustering Columns
WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key Clustering Columns
10010:99999
WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
2005:12:1:7:temp
-5.6
Primary key relationship
PRIMARY KEY ((weatherstation_id),year,month,day,hour)
Partition Key Clustering Columns
10010:99999
-5.1
2005:12:1:8:temp
-4.9
2005:12:1:9:temp
-5.3
2005:12:1:10:temp
WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Data Locality
weatherstation_id=‘10010:99999’ ?
1000 Node Cluster
You are here!
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation,hour,temperature
FROM raw_weather_data
WHERE weatherstation_id=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
Single seek on disk
2005:12:1:12
-5.4
2005:12:1:11
-4.9
2005:12:1:7
-5.6-5.1
2005:12:1:8
-4.9
2005:12:1:9
10010:99999
-5.3
2005:12:1:10
Partition key for locality
Query patterns
• Range queries
• “Slice” operation on disk
Programmers like this
Sorted by event_time
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.3
10010:99999
SELECT weatherstation,hour,temperature
FROM raw_weather_data
WHERE weatherstation_id=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
@chbatey
Drivers
@chbatey
Languages
• DataStax (open source)
- C#, Java, C++, Python, Node, Ruby
- Very similar programming API
• Other open source
- Go
- Clojure
- Erlang
- Haskell
- Many more Java/Python drivers
- Perl
@chbatey
DataStax Java Driver
• Open source
@chbatey
Cluster and Session
• Cluster is a singleton per cluster you want to connect to
• Session per keycap
@chbatey
Get all the events
public List<CustomerEvent> getAllCustomerEvents() {

return session.execute("select * from customers.customer_events")

.all().stream()

.map(mapCustomerEvent())

.collect(Collectors.toList());

}
private Function<Row, CustomerEvent> mapCustomerEvent() {

return row -> new CustomerEvent(

row.getString("customer_id"),

row.getUUID("time"),

row.getString("staff_id"),

row.getString("store_type"),

row.getString("event_type"),

row.getMap("tags", String.class, String.class));

}
@chbatey
All events for a particular customer
private PreparedStatement getEventsForCustomer;



@PostConstruct

public void prepareSatements() {

getEventsForCustomer =
session.prepare("select * from customers.customer_events where customer_id = ?");

}



public List<CustomerEvent> getCustomerEvents(String customerId) {

BoundStatement boundStatement = getEventsForCustomer.bind(customerId);

return session.execute(boundStatement)
.all().stream()

.map(mapCustomerEvent())

.collect(Collectors.toList());

}
@chbatey
Customer events for a time slice
public List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime,
long endTime) {


Select.Where getCustomers = QueryBuilder.select()

.all()

.from("customers", "customer_events")

.where(eq("customer_id", customerId))

.and(gt("time", UUIDs.startOf(startTime)))

.and(lt("time", UUIDs.endOf(endTime)));





return session.execute(getCustomers).all().stream()

.map(mapCustomerEvent())

.collect(Collectors.toList());

}
@chbatey
Mapping API
@Table(keyspace = "customers", name = "customer_events")

public class CustomerEvent {

@PartitionKey

@Column(name = "customer_id")

private String customerId;



@ClusteringColumn

private UUID time;



@Column(name = "staff_id")

private String staffId;



@Column(name = "store_type")

private String storeType;



@Column(name = "event_type")

private String eventType;



private Map<String, String> tags;
// ctr / getters etc
}

@chbatey
Mapping API
@Accessor

public interface CustomerEventDao {

@Query("select * from customers.customer_events where customer_id = :customerId")

Result<CustomerEvent> getCustomerEvents(String customerId);



@Query("select * from customers.customer_events")

Result<CustomerEvent> getAllCustomerEvents();



@Query("select * from customers.customer_events where customer_id = :customerId
and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)")

Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime,
long endTime);

}


@Bean

public CustomerEventDao customerEventDao() {

MappingManager mappingManager = new MappingManager(session);

return mappingManager.createAccessor(CustomerEventDao.class);

}
@chbatey
Adding some type safety
public enum StoreType {

ONLINE, RETAIL, FRANCHISE, MOBILE

}
@Table(keyspace = "customers", name = "customer_events")

public class CustomerEvent {

@PartitionKey

@Column(name = "customer_id")

private String customerId;



@ClusteringColumn()

private UUID time;



@Column(name = "staff_id")

private String staffId;



@Column(name = "store_type")

@Enumerated(EnumType.STRING) // could be EnumType.ORDINAL

private StoreType storeType;

@chbatey
User defined types
create TYPE store (name text, type text, postcode text) ;





CREATE TABLE customer_events_type(
customer_id text,
staff_id text,
time timeuuid,
store frozen<store>,
event_type text,
tags map<text, text>,
PRIMARY KEY ((customer_id), time));

@chbatey
Mapping user defined types
@UDT(keyspace = "customers", name = "store")

public class Store {

private String name;

private StoreType type;

private String postcode;
// getters etc
}
@Table(keyspace = "customers", name = "customer_events_type")

public class CustomerEventType {

@PartitionKey

@Column(name = "customer_id")

private String customerId;



@ClusteringColumn()

private UUID time;



@Column(name = "staff_id")

private String staffId;



@Frozen

private Store store;



@Column(name = "event_type")

private String eventType;



private Map<String, String> tags;

@chbatey
Mapping user defined types
@UDT(keyspace = "customers", name = "store")

public class Store {

private String name;

private StoreType type;

private String postcode;
// getters etc
}
@Table(keyspace = "customers", name = "customer_events_type")

public class CustomerEventType {

@PartitionKey

@Column(name = "customer_id")

private String customerId;



@ClusteringColumn()

private UUID time;



@Column(name = "staff_id")

private String staffId;



@Frozen

private Store store;



@Column(name = "event_type")

private String eventType;



private Map<String, String> tags;

@chbatey
On to C* 2.2 and 3.0
@chbatey
Summary
• Cassandra is a shared nothing masterless datastore
• Availability a.k.a up time is king
• Biggest hurdle is learning to model differently
• Modern drivers make it easy to work with
@chbatey
Thanks for listening
• Follow me on twitter @chbatey
• Cassandra + Fault tolerance posts a plenty:
• http://christopher-batey.blogspot.co.uk/
• Cassandra resources: http://planetcassandra.org/

1 Dundee - Cassandra 101

  • 1.
  • 2.
    Start your downloads! Linux/Mac: curl-L http://downloads.datastax.com/community/dsc-cassandra-2.1.6-bin.tar.gz | tar xz or google: Cassandra Cluster Manager (CCM) Windows: http://downloads.datastax.com/community/
  • 3.
    @chbatey Who am I? •Built a a lot of systems with Apache Cassandra at Sky • Work on a testing library for Cassandra • Help out Cassandra users • Twitter: @chbatey
  • 4.
    @chbatey Overview • Cassandra re-cap •Replication • Fault tolerance • Data modelling • Cassandra 2.2/3.0 (not released yet) • Spark 101 • Spark Cassandra: how it is implemented • (Maybe) A use case: KillrWeather
  • 5.
  • 6.
  • 7.
    @chbatey Common use cases •Ordereddata such as time series -Event stores -Financial transactions -Sensor data e.g IoT
  • 8.
    @chbatey Common use cases •Ordereddata such as time series -Event stores -Financial transactions -Sensor data e.g IoT •Non functional requirements: -Linear scalability -High throughout durable writes -Multi datacenter including active-active -Analytics without ETL
  • 9.
  • 10.
    @chbatey Cassandra Cassandra • Distributed masterless database(Dynamo) • Column family data model (Google BigTable)
  • 11.
    @chbatey Datacenter and rackaware Europe • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start USA
  • 12.
    @chbatey Cassandra Online • Distributed masterless database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start • Analytics with Apache SparkAnalytics
  • 13.
  • 14.
    @chbatey Dynamo 101 • Theparts Cassandra took - Consistent hashing - Replication - Gossip - Hinted handoff - Anti-entropy repair • And the parts it left behind - Key/Value - Vector clocks
  • 15.
    @chbatey Picking the rightnodes • You don’t want a full table scan on a 1000 node cluster! • Dynamo to the rescue: Consistent Hashing
  • 16.
    @chbatey Example • Data: • Murmer3Hash Values: jim age: 36 car: ford gender: M carol age: 37 car: bmw gender: F johnny age: 12 gender: M suzy: age: 10 gender: F Primary Key Murmur3 hash value jim 350 carol 998 johnny 50 suzy 600 Primary Key
  • 17.
    @chbatey Example Four node cluster: NodeMurmur3 start range Murmur3 end range A 0 249 B 250 499 C 500 749 D 750 999
  • 18.
  • 19.
    @chbatey Example Data is distributedas: Node Start range End range Primary key Hash value A 0 249 johnny 50 B 250 499 jim 350 C 500 749 suzy 600 D 750 999 carol 998
  • 20.
  • 21.
    @chbatey Replication strategy • NetworkTopology -Every Cassandra node knows its DC and Rack - Replicas won’t be put on the same rack unless Replication Factor > # of racks - Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
  • 22.
  • 23.
    @chbatey Tunable Consistency •Data isreplicated N times •Every query that you execute you give a consistency -ALL -QUORUM -LOCAL_QUORUM -ONE • Christos Kalantzis Eventual Consistency != Hopeful Consistency: http:// youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
  • 24.
    @chbatey Light weight transactions •IF NOT EXISTS • Compare and set
  • 25.
    @chbatey Load balancing •Data centreaware policy •Token aware policy •Latency aware policy •Whitelist policy APP APP Async Replication DC1 DC2
  • 26.
    @chbatey But what happenswhen they come back? • Hinted handoff to the rescue • Coordinators keep writes for downed nodes for a configurable amount of time, default 3 hours • Longer than that run a repair
  • 27.
    @chbatey Anti entropy repair •Not exciting but mandatory :) • New in 2.1 - incremental repair <— awesome
  • 28.
    @chbatey Scaling shouldn’t behard • Throw more nodes at a cluster • Bootstrapping + joining the ring • For large data sets this can take some time
  • 29.
  • 30.
  • 31.
    @chbatey Cassandra can notjoin Client Join what?
  • 32.
    @chbatey CQL •Cassandra Query Language -SQLlike query language •Keyspace – analogous to a schema - The keyspace determines the RF (replication factor) •Table – looks like a SQL Table CREATE TABLE scores ( name text, score int, date timestamp, PRIMARY KEY (name, score) ); INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24'); INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25'); SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
  • 33.
  • 34.
    @chbatey UUID • Universal UniqueID - 128 bit number represented in character form e.g. 99051fe9-6a9c-46c2-b949-38ef78858dd0 • Easily generated on the client - Version 1 has a timestamp component (TIMEUUID) - Version 4 has no timestamp component
  • 35.
    @chbatey Collections CREATE TABLE videos( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
  • 36.
    @chbatey Data Model -User Defined Types • Complex data in one place • No multi-gets (multi-partitions) • Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
  • 37.
    @chbatey Time-to-Live (TTL) TTL arow: INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’) USING TTL 3600; // Expires data in one hour
 TTL a column: UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’ – TTL in seconds – Can also set default TTL at a table level – Expired columns/values automatically deleted – With no TTL specified, columns/values never expire – TTL is useful for automatic deletion – Re-inserting the same row before it expires will overwrite TTL
  • 38.
  • 39.
    Storing weather data CREATETABLE raw_weather_data ( weather_station text, year int, month int, day int, hour int, temp double, PRIMARY KEY ((weather_station), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
  • 40.
    Primary key relationship PRIMARYKEY ((weatherstation_id),year,month,day,hour)
  • 41.
    Primary key relationship PRIMARYKEY ((weatherstation_id),year,month,day,hour) Partition Key
  • 42.
    Primary key relationship PRIMARYKEY ((weatherstation_id),year,month,day,hour) Partition Key Clustering Columns WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
  • 43.
    Primary key relationship PRIMARYKEY ((weatherstation_id),year,month,day,hour) Partition Key Clustering Columns 10010:99999 WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
  • 44.
    2005:12:1:7:temp -5.6 Primary key relationship PRIMARYKEY ((weatherstation_id),year,month,day,hour) Partition Key Clustering Columns 10010:99999 -5.1 2005:12:1:8:temp -4.9 2005:12:1:9:temp -5.3 2005:12:1:10:temp WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
  • 45.
  • 46.
    Query patterns • Rangequeries • “Slice” operation on disk SELECT weatherstation,hour,temperature FROM raw_weather_data WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10; Single seek on disk 2005:12:1:12 -5.4 2005:12:1:11 -4.9 2005:12:1:7 -5.6-5.1 2005:12:1:8 -4.9 2005:12:1:9 10010:99999 -5.3 2005:12:1:10 Partition key for locality
  • 47.
    Query patterns • Rangequeries • “Slice” operation on disk Programmers like this Sorted by event_time 2005:12:1:7 -5.6 2005:12:1:8 -5.1 2005:12:1:9 -4.9 10010:99999 10010:99999 10010:99999 weather_station hour temperature 2005:12:1:10 -5.3 10010:99999 SELECT weatherstation,hour,temperature FROM raw_weather_data WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
  • 48.
  • 49.
    @chbatey Languages • DataStax (opensource) - C#, Java, C++, Python, Node, Ruby - Very similar programming API • Other open source - Go - Clojure - Erlang - Haskell - Many more Java/Python drivers - Perl
  • 50.
  • 51.
    @chbatey Cluster and Session •Cluster is a singleton per cluster you want to connect to • Session per keycap
  • 52.
    @chbatey Get all theevents public List<CustomerEvent> getAllCustomerEvents() {
 return session.execute("select * from customers.customer_events")
 .all().stream()
 .map(mapCustomerEvent())
 .collect(Collectors.toList());
 } private Function<Row, CustomerEvent> mapCustomerEvent() {
 return row -> new CustomerEvent(
 row.getString("customer_id"),
 row.getUUID("time"),
 row.getString("staff_id"),
 row.getString("store_type"),
 row.getString("event_type"),
 row.getMap("tags", String.class, String.class));
 }
  • 53.
    @chbatey All events fora particular customer private PreparedStatement getEventsForCustomer;
 
 @PostConstruct
 public void prepareSatements() {
 getEventsForCustomer = session.prepare("select * from customers.customer_events where customer_id = ?");
 }
 
 public List<CustomerEvent> getCustomerEvents(String customerId) {
 BoundStatement boundStatement = getEventsForCustomer.bind(customerId);
 return session.execute(boundStatement) .all().stream()
 .map(mapCustomerEvent())
 .collect(Collectors.toList());
 }
  • 54.
    @chbatey Customer events fora time slice public List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime) { 
 Select.Where getCustomers = QueryBuilder.select()
 .all()
 .from("customers", "customer_events")
 .where(eq("customer_id", customerId))
 .and(gt("time", UUIDs.startOf(startTime)))
 .and(lt("time", UUIDs.endOf(endTime)));
 
 
 return session.execute(getCustomers).all().stream()
 .map(mapCustomerEvent())
 .collect(Collectors.toList());
 }
  • 55.
    @chbatey Mapping API @Table(keyspace ="customers", name = "customer_events")
 public class CustomerEvent {
 @PartitionKey
 @Column(name = "customer_id")
 private String customerId;
 
 @ClusteringColumn
 private UUID time;
 
 @Column(name = "staff_id")
 private String staffId;
 
 @Column(name = "store_type")
 private String storeType;
 
 @Column(name = "event_type")
 private String eventType;
 
 private Map<String, String> tags; // ctr / getters etc }

  • 56.
    @chbatey Mapping API @Accessor
 public interfaceCustomerEventDao {
 @Query("select * from customers.customer_events where customer_id = :customerId")
 Result<CustomerEvent> getCustomerEvents(String customerId);
 
 @Query("select * from customers.customer_events")
 Result<CustomerEvent> getAllCustomerEvents();
 
 @Query("select * from customers.customer_events where customer_id = :customerId and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)")
 Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime);
 } 
 @Bean
 public CustomerEventDao customerEventDao() {
 MappingManager mappingManager = new MappingManager(session);
 return mappingManager.createAccessor(CustomerEventDao.class);
 }
  • 57.
    @chbatey Adding some typesafety public enum StoreType {
 ONLINE, RETAIL, FRANCHISE, MOBILE
 } @Table(keyspace = "customers", name = "customer_events")
 public class CustomerEvent {
 @PartitionKey
 @Column(name = "customer_id")
 private String customerId;
 
 @ClusteringColumn()
 private UUID time;
 
 @Column(name = "staff_id")
 private String staffId;
 
 @Column(name = "store_type")
 @Enumerated(EnumType.STRING) // could be EnumType.ORDINAL
 private StoreType storeType;

  • 58.
    @chbatey User defined types createTYPE store (name text, type text, postcode text) ;
 
 
 CREATE TABLE customer_events_type( customer_id text, staff_id text, time timeuuid, store frozen<store>, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time));

  • 59.
    @chbatey Mapping user definedtypes @UDT(keyspace = "customers", name = "store")
 public class Store {
 private String name;
 private StoreType type;
 private String postcode; // getters etc } @Table(keyspace = "customers", name = "customer_events_type")
 public class CustomerEventType {
 @PartitionKey
 @Column(name = "customer_id")
 private String customerId;
 
 @ClusteringColumn()
 private UUID time;
 
 @Column(name = "staff_id")
 private String staffId;
 
 @Frozen
 private Store store;
 
 @Column(name = "event_type")
 private String eventType;
 
 private Map<String, String> tags;

  • 60.
    @chbatey Mapping user definedtypes @UDT(keyspace = "customers", name = "store")
 public class Store {
 private String name;
 private StoreType type;
 private String postcode; // getters etc } @Table(keyspace = "customers", name = "customer_events_type")
 public class CustomerEventType {
 @PartitionKey
 @Column(name = "customer_id")
 private String customerId;
 
 @ClusteringColumn()
 private UUID time;
 
 @Column(name = "staff_id")
 private String staffId;
 
 @Frozen
 private Store store;
 
 @Column(name = "event_type")
 private String eventType;
 
 private Map<String, String> tags;

  • 61.
    @chbatey On to C*2.2 and 3.0
  • 62.
    @chbatey Summary • Cassandra isa shared nothing masterless datastore • Availability a.k.a up time is king • Biggest hurdle is learning to model differently • Modern drivers make it easy to work with
  • 63.
    @chbatey Thanks for listening •Follow me on twitter @chbatey • Cassandra + Fault tolerance posts a plenty: • http://christopher-batey.blogspot.co.uk/ • Cassandra resources: http://planetcassandra.org/