Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building awesome applications with Apache 
Cassandra 
Christopher Batey 
@chbatey 
©2013 DataStax Confidential. Do not dis...
Who am I? 
•Technical Evangelist for Apache Cassandra 
• Founder of Stubbed Cassandra 
• Help out Apache Cassandra users 
...
@chbatey 
Overview 
• Topics covered 
• Cassandra overview 
• Customer events example 
• DataStax Java Driver 
• Java Mapp...
@chbatey 
Overview 
• Topics covered 
• Cassandra overview 
• Customer events example 
• DataStax Java Driver 
• Java Mapp...
Common use cases 
•Ordered data such as time series 
•Event stores 
•Financial transactions 
•Sensor data e.g IoT 
@chbate...
Common use cases 
•Ordered data such as time series 
•Event stores 
•Financial transactions 
•Sensor data e.g IoT 
•Non fu...
Cassandra overview 
@chbatey
Cassandra 
Cassandra 
• Distributed master less 
database (Dynamo) 
• Column family data model 
(Google BigTable)
Cassandra 
Europe 
• Distributed master less 
database (Dynamo) 
• Column family data model 
(Google BigTable) 
• Multi da...
Cassandra 
Online 
• Distributed master less 
database (Dynamo) 
• Column family data model 
(Google BigTable) 
• Multi da...
Replication 
WRITE 
CL = 1 We have replication! 
DC1 DC2 
client 
C 
RC 
RF3 RF3
Tunable Consistency 
• Data is replicated N times 
• Every query that you execute you give a consistency 
• ALL 
• QUORUM ...
CQL 
•Cassandra Query Language 
•SQL like query language 
•Keyspace – analogous to a schema 
• The keyspace determines the...
Example Time: Customer event store 
@chbatey
An example: Customer event store 
• Customer event 
• customer_id - ChrisBatey 
• staff_id - Charlie 
• store_type Website...
Requirements 
• Get all events 
• Get all events for a particular customer 
• As above for a time slice
Modelling in Cassandra 
CREATE TABLE customer_events( 
customer_id text, 
staff_id text, 
Partition Key 
time timeuuid, 
s...
How it is stored on disk 
customer 
_id 
time event_type store_type tags 
charles 2014-11-18 16:52:04 basket_add online {'...
DataStax Java Driver 
• Open source 
@chbatey
@chbatey 
Get all the events 
public List<CustomerEvent> getAllCustomerEvents() { 
return session.execute("select * from c...
All events for a particular customer 
private PreparedStatement getEventsForCustomer; 
@PostConstruct 
public void prepare...
Customer events for a time slice 
public List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, 
...
@chbatey 
Mapping API 
@Table(keyspace = "customers", name = "customer_events") 
public class CustomerEvent { 
@PartitionK...
@chbatey 
Mapping API 
@Accessor 
public interface CustomerEventDao { 
@Query("select * from customers.customer_events whe...
Adding some type safety 
public enum StoreType { 
ONLINE, RETAIL, FRANCHISE, MOBILE 
@chbatey 
} 
@Table(keyspace = "custo...
@chbatey 
User defined types 
create TYPE store (name text, type text, postcode text) ; 
CREATE TABLE customer_events_type...
Mapping user defined types 
@chbatey 
@UDT(keyspace = "customers", name = "store") 
public class Store { 
private String n...
Mapping user defined types 
@chbatey 
@UDT(keyspace = "customers", name = "store") 
public class Store { 
private String n...
What else can I do? 
@chbatey
Lightweight Transactions (LWT) 
Consequences of Lightweight Transactions 
4 round trips vs. 1 for normal updates (uses Pax...
Company Confidential 
@chbatey 
Batch Statements 
BEGIN BATCH 
INSERT INTO users (userID, password, name) VALUES ('user2',...
Batch Statements with LWT 
BEGIN BATCH 
UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; 
UPDATE foo SET z = 2 WHERE x = 'a' ...
Load balancing 
• Data centre aware policy 
• Token aware policy 
• Latency aware policy 
• Whitelist policy APP APP 
DC1 ...
Load balancing 
• Data centre aware policy 
• Token aware policy 
• Latency aware policy 
• Whitelist policy APP APP 
DC1 ...
Reconnection Policies 
• Policy that decides how often the reconnection to a dead node is 
attempted. 
Cluster cluster = C...
Reconnection Policies 
• Policy that decides how often the reconnection to a dead node is 
attempted. 
Cluster cluster = C...
@chbatey 
Summary 
• Cassandra overview 
• Customer events example 
• DataStax Java Driver 
• Java Mapping API 
• Other fe...
Thanks for listening 
• Badger me on twitter @chbatey 
• https://github.com/chbatey/cassandra-customer-events 
• https://a...
© 2014 DataStax, All Rights Reserved. Company Confidential 
Training Day | December 3rd 
Beginner Track 
• Introduction to...
Upcoming SlideShare
Loading in …5
×

LJC Conference 2014 Cassandra for Java Developers

2,255 views

Published on

LJC Conference 2014 Cassandra for Java Developers

  • Be the first to comment

LJC Conference 2014 Cassandra for Java Developers

  1. 1. Building awesome applications with Apache Cassandra Christopher Batey @chbatey ©2013 DataStax Confidential. Do not distribute without consent. 1
  2. 2. Who am I? •Technical Evangelist for Apache Cassandra • Founder of Stubbed Cassandra • Help out Apache Cassandra users • Previous: Cassandra backed apps at BSkyB @chbatey
  3. 3. @chbatey Overview • Topics covered • Cassandra overview • Customer events example • DataStax Java Driver • Java Mapping API • Other features • Light weight transactions • Load balancing • Reconnection policies
  4. 4. @chbatey Overview • Topics covered • Cassandra overview • Customer events example • DataStax Java Driver • Java Mapping API • Other features • Light weight transactions • Load balancing • Reconnection policies • Not covered • Cassandra read and write paths • Cassandra failure nodes
  5. 5. Common use cases •Ordered data such as time series •Event stores •Financial transactions •Sensor data e.g IoT @chbatey
  6. 6. Common use cases •Ordered data such as time series •Event stores •Financial transactions •Sensor data e.g IoT •Non functional requirements: • Linear scalability • High throughout durable writes •Multi datacenter including active-active •Analytics without ETL @chbatey
  7. 7. Cassandra overview @chbatey
  8. 8. Cassandra Cassandra • Distributed master less database (Dynamo) • Column family data model (Google BigTable)
  9. 9. Cassandra Europe • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start USA
  10. 10. Cassandra Online • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start • Analytics with Apache Spark Analytics
  11. 11. Replication WRITE CL = 1 We have replication! DC1 DC2 client C RC RF3 RF3
  12. 12. Tunable Consistency • Data is replicated N times • Every query that you execute you give a consistency • ALL • QUORUM • LOCAL_QUORUM • ONE • Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://youtu.be/ A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU @chbatey
  13. 13. CQL •Cassandra Query Language •SQL like query language •Keyspace – analogous to a schema • The keyspace determines the RF (replication factor) •Table – looks like a SQL Table CREATE TABLE scores ( @chbatey name text, score int, date timestamp, PRIMARY KEY (name, score) ); INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24'); INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25'); SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
  14. 14. Example Time: Customer event store @chbatey
  15. 15. An example: Customer event store • Customer event • customer_id - ChrisBatey • staff_id - Charlie • store_type Website, PhoneApp, Phone, Retail • event_type - login, logout, add_to_basket, remove_from_basket, buy_item • time • tags
  16. 16. Requirements • Get all events • Get all events for a particular customer • As above for a time slice
  17. 17. Modelling in Cassandra CREATE TABLE customer_events( customer_id text, staff_id text, Partition Key time timeuuid, store_type text, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time)); Clustering Column(s)
  18. 18. How it is stored on disk customer _id time event_type store_type tags charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'} charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'} charles 2014-11-18 16:53:09 logout online {} chbatey 2014-11-18 16:52:21 login online {} chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'} chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'} charles event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item wine event_type logout staff_id n/a store_type online chbatey event_type login staff_id n/a store_type online event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item cheese
  19. 19. DataStax Java Driver • Open source @chbatey
  20. 20. @chbatey Get all the events public List<CustomerEvent> getAllCustomerEvents() { return session.execute("select * from customers.customer_events") .all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); } private Function<Row, CustomerEvent> mapCustomerEvent() { return row -> new CustomerEvent( row.getString("customer_id"), row.getUUID("time"), row.getString("staff_id"), row.getString("store_type"), row.getString("event_type"), row.getMap("tags", String.class, String.class)); }
  21. 21. All events for a particular customer private PreparedStatement getEventsForCustomer; @PostConstruct public void prepareSatements() { getEventsForCustomer = session.prepare("select * from customers.customer_events where customer_id = ?"); } public List<CustomerEvent> getCustomerEvents(String customerId) { BoundStatement boundStatement = getEventsForCustomer.bind(customerId); return session.execute(boundStatement) .all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); @chbatey }
  22. 22. Customer events for a time slice public List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime) { Select.Where getCustomers = QueryBuilder.select() .all() .from("customers", "customer_events") .where(eq("customer_id", customerId)) .and(gt("time", UUIDs.startOf(startTime))) .and(lt("time", UUIDs.endOf(endTime))); return session.execute(getCustomers).all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); @chbatey }
  23. 23. @chbatey Mapping API @Table(keyspace = "customers", name = "customer_events") public class CustomerEvent { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn private UUID time; @Column(name = "staff_id") private String staffId; @Column(name = "store_type") private String storeType; @Column(name = "event_type") private String eventType; private Map<String, String> tags; // ctr / getters etc }
  24. 24. @chbatey Mapping API @Accessor public interface CustomerEventDao { @Query("select * from customers.customer_events where customer_id = :customerId") Result<CustomerEvent> getCustomerEvents(String customerId); @Query("select * from customers.customer_events") Result<CustomerEvent> getAllCustomerEvents(); @Query("select * from customers.customer_events where customer_id = :customerId and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)") Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime); } @Bean public CustomerEventDao customerEventDao() { MappingManager mappingManager = new MappingManager(session); return mappingManager.createAccessor(CustomerEventDao.class); }
  25. 25. Adding some type safety public enum StoreType { ONLINE, RETAIL, FRANCHISE, MOBILE @chbatey } @Table(keyspace = "customers", name = "customer_events") public class CustomerEvent { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Column(name = "store_type") @Enumerated(EnumType.STRING) // could be EnumType.ORDINAL private StoreType storeType;
  26. 26. @chbatey User defined types create TYPE store (name text, type text, postcode text) ; CREATE TABLE customer_events_type( customer_id text, staff_id text, time timeuuid, store frozen<store>, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time));
  27. 27. Mapping user defined types @chbatey @UDT(keyspace = "customers", name = "store") public class Store { private String name; private StoreType type; private String postcode; // getters etc } @Table(keyspace = "customers", name = "customer_events_type") public class CustomerEventType { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Frozen private Store store; @Column(name = "event_type") private String eventType; private Map<String, String> tags;
  28. 28. Mapping user defined types @chbatey @UDT(keyspace = "customers", name = "store") public class Store { private String name; private StoreType type; private String postcode; // getters etc } @Table(keyspace = "customers", name = "customer_events_type") public class CustomerEventType { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Frozen private Store store; @Column(name = "event_type") private String eventType; private Map<String, String> tags; @Query("select * from customers.customer_events_type") Result<CustomerEventType> getAllCustomerEventsWithStoreType();
  29. 29. What else can I do? @chbatey
  30. 30. Lightweight Transactions (LWT) Consequences of Lightweight Transactions 4 round trips vs. 1 for normal updates (uses Paxos algorithm) Operations are done on a per-partition basis Will be going across data centres to obtain consensus (unless you use LOCAL_SERIAL consistency) Cassandra user will need read and write access i.e. you get back the row! Great for 1% your app, but eventual consistency is still your friend! @chbatey
  31. 31. Company Confidential @chbatey Batch Statements BEGIN BATCH INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE userID = 'user2’ APPLY BATCH; BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical operation Atomic operation If any statement in the batch succeeds, all will No batch isolation Other “transactions” can read and write data being affected by a partially executed batch © 2014 DataStax, All Rights Reserved.
  32. 32. Batch Statements with LWT BEGIN BATCH UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4; Company Confidential @chbatey APPLY BATCH; Allows you to group multiple conditional updates in a batch as long as all those updates apply to the same partition © 2014 DataStax, All Rights Reserved.
  33. 33. Load balancing • Data centre aware policy • Token aware policy • Latency aware policy • Whitelist policy APP APP DC1 DC2 @chbatey Async Replication
  34. 34. Load balancing • Data centre aware policy • Token aware policy • Latency aware policy • Whitelist policy APP APP DC1 DC2 @chbatey Async Replication
  35. 35. Reconnection Policies • Policy that decides how often the reconnection to a dead node is attempted. Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy()) .build(); • ConstantReconnectionPolicy • ExponentialReconnectionPolicy (Default) @chbatey ©2014 DataStax. Do not distribute without consent.
  36. 36. Reconnection Policies • Policy that decides how often the reconnection to a dead node is attempted. Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy()) .build(); • ConstantReconnectionPolicy • ExponentialReconnectionPolicy (Default) @chbatey ©2014 DataStax. Do not distribute without consent.
  37. 37. @chbatey Summary • Cassandra overview • Customer events example • DataStax Java Driver • Java Mapping API • Other features • Light weight transactions • Load balancing • Reconnection policies
  38. 38. Thanks for listening • Badger me on twitter @chbatey • https://github.com/chbatey/cassandra-customer-events • https://academy.datastax.com/ • http://christopher-batey.blogspot.co.uk/ @chbatey
  39. 39. © 2014 DataStax, All Rights Reserved. Company Confidential Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 39

×