LJC Conference 2014 Cassandra for Java Developers

Christopher Batey
Christopher BateyFreelance Software Engineer
Building awesome applications with Apache 
Cassandra 
Christopher Batey 
@chbatey 
©2013 DataStax Confidential. Do not distribute without consent. 
1
Who am I? 
•Technical Evangelist for Apache Cassandra 
• Founder of Stubbed Cassandra 
• Help out Apache Cassandra users 
• Previous: Cassandra backed apps at BSkyB 
@chbatey
@chbatey 
Overview 
• Topics covered 
• Cassandra overview 
• Customer events example 
• DataStax Java Driver 
• Java Mapping API 
• Other features 
• Light weight transactions 
• Load balancing 
• Reconnection policies
@chbatey 
Overview 
• Topics covered 
• Cassandra overview 
• Customer events example 
• DataStax Java Driver 
• Java Mapping API 
• Other features 
• Light weight transactions 
• Load balancing 
• Reconnection policies 
• Not covered 
• Cassandra read and write paths 
• Cassandra failure nodes
Common use cases 
•Ordered data such as time series 
•Event stores 
•Financial transactions 
•Sensor data e.g IoT 
@chbatey
Common use cases 
•Ordered data such as time series 
•Event stores 
•Financial transactions 
•Sensor data e.g IoT 
•Non functional requirements: 
• Linear scalability 
• High throughout durable writes 
•Multi datacenter including active-active 
•Analytics without ETL 
@chbatey
Cassandra overview 
@chbatey
Cassandra 
Cassandra 
• Distributed master less 
database (Dynamo) 
• Column family data model 
(Google BigTable)
Cassandra 
Europe 
• Distributed master less 
database (Dynamo) 
• Column family data model 
(Google BigTable) 
• Multi data centre replication 
built in from the start 
USA
Cassandra 
Online 
• Distributed master less 
database (Dynamo) 
• Column family data model 
(Google BigTable) 
• Multi data centre replication 
built in from the start 
• Analytics with Apache 
Spark 
Analytics
Replication 
WRITE 
CL = 1 We have replication! 
DC1 DC2 
client 
C 
RC 
RF3 RF3
Tunable Consistency 
• Data is replicated N times 
• Every query that you execute you give a consistency 
• ALL 
• QUORUM 
• LOCAL_QUORUM 
• ONE 
• Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://youtu.be/ 
A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU 
@chbatey
CQL 
•Cassandra Query Language 
•SQL like query language 
•Keyspace – analogous to a schema 
• The keyspace determines the RF (replication factor) 
•Table – looks like a SQL Table CREATE TABLE scores ( 
@chbatey 
name text, 
score int, 
date timestamp, 
PRIMARY KEY (name, score) 
); 
INSERT INTO scores (name, score, date) 
VALUES ('bob', 42, '2012-06-24'); 
INSERT INTO scores (name, score, date) 
VALUES ('bob', 47, '2012-06-25'); 
SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
Example Time: Customer event store 
@chbatey
An example: Customer event store 
• Customer event 
• customer_id - ChrisBatey 
• staff_id - Charlie 
• store_type Website, PhoneApp, Phone, Retail 
• event_type - login, logout, add_to_basket, 
remove_from_basket, buy_item 
• time 
• tags
Requirements 
• Get all events 
• Get all events for a particular customer 
• As above for a time slice
Modelling in Cassandra 
CREATE TABLE customer_events( 
customer_id text, 
staff_id text, 
Partition Key 
time timeuuid, 
store_type text, 
event_type text, 
tags map<text, text>, 
PRIMARY KEY ((customer_id), time)); 
Clustering Column(s)
How it is stored on disk 
customer 
_id 
time event_type store_type tags 
charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'} 
charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'} 
charles 2014-11-18 16:53:09 logout online {} 
chbatey 2014-11-18 16:52:21 login online {} 
chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'} 
chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'} 
charles 
event_type 
basket_add 
staff_id 
n/a 
store_type 
online 
tags:item 
coffee 
event_type 
basket_add 
staff_id 
n/a 
store_type 
online 
tags:item 
wine 
event_type 
logout 
staff_id 
n/a 
store_type 
online 
chbatey 
event_type 
login 
staff_id 
n/a 
store_type 
online 
event_type 
basket_add 
staff_id 
n/a 
store_type 
online 
tags:item 
coffee 
event_type 
basket_add 
staff_id 
n/a 
store_type 
online 
tags:item 
cheese
DataStax Java Driver 
• Open source 
@chbatey
@chbatey 
Get all the events 
public List<CustomerEvent> getAllCustomerEvents() { 
return session.execute("select * from customers.customer_events") 
.all().stream() 
.map(mapCustomerEvent()) 
.collect(Collectors.toList()); 
} 
private Function<Row, CustomerEvent> mapCustomerEvent() { 
return row -> new CustomerEvent( 
row.getString("customer_id"), 
row.getUUID("time"), 
row.getString("staff_id"), 
row.getString("store_type"), 
row.getString("event_type"), 
row.getMap("tags", String.class, String.class)); 
}
All events for a particular customer 
private PreparedStatement getEventsForCustomer; 
@PostConstruct 
public void prepareSatements() { 
getEventsForCustomer = 
session.prepare("select * from customers.customer_events where customer_id = ?"); 
} 
public List<CustomerEvent> getCustomerEvents(String customerId) { 
BoundStatement boundStatement = getEventsForCustomer.bind(customerId); 
return session.execute(boundStatement) 
.all().stream() 
.map(mapCustomerEvent()) 
.collect(Collectors.toList()); 
@chbatey 
}
Customer events for a time slice 
public List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, 
long endTime) { 
Select.Where getCustomers = QueryBuilder.select() 
.all() 
.from("customers", "customer_events") 
.where(eq("customer_id", customerId)) 
.and(gt("time", UUIDs.startOf(startTime))) 
.and(lt("time", UUIDs.endOf(endTime))); 
return session.execute(getCustomers).all().stream() 
.map(mapCustomerEvent()) 
.collect(Collectors.toList()); 
@chbatey 
}
@chbatey 
Mapping API 
@Table(keyspace = "customers", name = "customer_events") 
public class CustomerEvent { 
@PartitionKey 
@Column(name = "customer_id") 
private String customerId; 
@ClusteringColumn 
private UUID time; 
@Column(name = "staff_id") 
private String staffId; 
@Column(name = "store_type") 
private String storeType; 
@Column(name = "event_type") 
private String eventType; 
private Map<String, String> tags; 
// ctr / getters etc 
}
@chbatey 
Mapping API 
@Accessor 
public interface CustomerEventDao { 
@Query("select * from customers.customer_events where customer_id = :customerId") 
Result<CustomerEvent> getCustomerEvents(String customerId); 
@Query("select * from customers.customer_events") 
Result<CustomerEvent> getAllCustomerEvents(); 
@Query("select * from customers.customer_events where customer_id = :customerId 
and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)") 
Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, 
long endTime); 
} 
@Bean 
public CustomerEventDao customerEventDao() { 
MappingManager mappingManager = new MappingManager(session); 
return mappingManager.createAccessor(CustomerEventDao.class); 
}
Adding some type safety 
public enum StoreType { 
ONLINE, RETAIL, FRANCHISE, MOBILE 
@chbatey 
} 
@Table(keyspace = "customers", name = "customer_events") 
public class CustomerEvent { 
@PartitionKey 
@Column(name = "customer_id") 
private String customerId; 
@ClusteringColumn() 
private UUID time; 
@Column(name = "staff_id") 
private String staffId; 
@Column(name = "store_type") 
@Enumerated(EnumType.STRING) // could be EnumType.ORDINAL 
private StoreType storeType;
@chbatey 
User defined types 
create TYPE store (name text, type text, postcode text) ; 
CREATE TABLE customer_events_type( 
customer_id text, 
staff_id text, 
time timeuuid, 
store frozen<store>, 
event_type text, 
tags map<text, text>, 
PRIMARY KEY ((customer_id), time));
Mapping user defined types 
@chbatey 
@UDT(keyspace = "customers", name = "store") 
public class Store { 
private String name; 
private StoreType type; 
private String postcode; 
// getters etc 
} 
@Table(keyspace = "customers", name = "customer_events_type") 
public class CustomerEventType { 
@PartitionKey 
@Column(name = "customer_id") 
private String customerId; 
@ClusteringColumn() 
private UUID time; 
@Column(name = "staff_id") 
private String staffId; 
@Frozen 
private Store store; 
@Column(name = "event_type") 
private String eventType; 
private Map<String, String> tags;
Mapping user defined types 
@chbatey 
@UDT(keyspace = "customers", name = "store") 
public class Store { 
private String name; 
private StoreType type; 
private String postcode; 
// getters etc 
} 
@Table(keyspace = "customers", name = "customer_events_type") 
public class CustomerEventType { 
@PartitionKey 
@Column(name = "customer_id") 
private String customerId; 
@ClusteringColumn() 
private UUID time; 
@Column(name = "staff_id") 
private String staffId; 
@Frozen 
private Store store; 
@Column(name = "event_type") 
private String eventType; 
private Map<String, String> tags; 
@Query("select * from customers.customer_events_type") 
Result<CustomerEventType> getAllCustomerEventsWithStoreType();
What else can I do? 
@chbatey
Lightweight Transactions (LWT) 
Consequences of Lightweight Transactions 
4 round trips vs. 1 for normal updates (uses Paxos algorithm) 
Operations are done on a per-partition basis 
Will be going across data centres to obtain consensus (unless you use 
LOCAL_SERIAL consistency) 
Cassandra user will need read and write access i.e. you get back the row! 
Great for 1% your app, but eventual consistency is still your friend! 
@chbatey
Company Confidential 
@chbatey 
Batch Statements 
BEGIN BATCH 
INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') 
UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' 
INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c') 
DELETE name FROM users WHERE userID = 'user2’ 
APPLY BATCH; 
BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical 
operation 
Atomic operation 
If any statement in the batch succeeds, all will 
No batch isolation 
Other “transactions” can read and write data being affected by a partially executed batch 
© 2014 DataStax, All Rights Reserved.
Batch Statements with LWT 
BEGIN BATCH 
UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; 
UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4; 
Company Confidential 
@chbatey 
APPLY BATCH; 
Allows you to group multiple conditional updates in a batch as long as all 
those updates apply to the same partition 
© 2014 DataStax, All Rights Reserved.
Load balancing 
• Data centre aware policy 
• Token aware policy 
• Latency aware policy 
• Whitelist policy APP APP 
DC1 DC2 
@chbatey 
Async Replication
Load balancing 
• Data centre aware policy 
• Token aware policy 
• Latency aware policy 
• Whitelist policy APP APP 
DC1 DC2 
@chbatey 
Async Replication
Reconnection Policies 
• Policy that decides how often the reconnection to a dead node is 
attempted. 
Cluster cluster = Cluster.builder() 
.addContactPoints("127.0.0.1", "127.0.0.2") 
.withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) 
.withLoadBalancingPolicy(new TokenAwarePolicy()) 
.build(); 
• ConstantReconnectionPolicy 
• ExponentialReconnectionPolicy (Default) 
@chbatey ©2014 DataStax. Do not distribute without consent.
Reconnection Policies 
• Policy that decides how often the reconnection to a dead node is 
attempted. 
Cluster cluster = Cluster.builder() 
.addContactPoints("127.0.0.1", "127.0.0.2") 
.withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) 
.withLoadBalancingPolicy(new TokenAwarePolicy()) 
.build(); 
• ConstantReconnectionPolicy 
• ExponentialReconnectionPolicy (Default) 
@chbatey ©2014 DataStax. Do not distribute without consent.
@chbatey 
Summary 
• Cassandra overview 
• Customer events example 
• DataStax Java Driver 
• Java Mapping API 
• Other features 
• Light weight transactions 
• Load balancing 
• Reconnection policies
Thanks for listening 
• Badger me on twitter @chbatey 
• https://github.com/chbatey/cassandra-customer-events 
• https://academy.datastax.com/ 
• http://christopher-batey.blogspot.co.uk/ 
@chbatey
© 2014 DataStax, All Rights Reserved. Company Confidential 
Training Day | December 3rd 
Beginner Track 
• Introduction to Cassandra 
• Introduction to Spark, Shark, Scala and Cassandra 
Advanced Track 
• Data Modeling 
• Performance Tuning 
Conference Day | December 4th 
Cassandra Summit Europe 2014 will be the single 
largest gathering of Cassandra users in Europe. Learn 
how the world's most successful companies are 
transforming their businesses and growing faster than 
ever using Apache Cassandra. 
http://bit.ly/cassandrasummit2014 
39
1 of 39

Recommended

Fault tolerant microservices - LJC Skills Matter 4thNov2014 by
Fault tolerant microservices - LJC Skills Matter 4thNov2014Fault tolerant microservices - LJC Skills Matter 4thNov2014
Fault tolerant microservices - LJC Skills Matter 4thNov2014Christopher Batey
4.2K views58 slides
Cassandra Summit EU 2014 - Testing Cassandra Applications by
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsCassandra Summit EU 2014 - Testing Cassandra Applications
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsChristopher Batey
2.6K views46 slides
Cassandra Summit EU 2014 Lightning talk - Paging (no animation) by
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Christopher Batey
2.1K views16 slides
Cassandra is great but how do I test my application? by
Cassandra is great but how do I test my application?Cassandra is great but how do I test my application?
Cassandra is great but how do I test my application?Christopher Batey
9.4K views34 slides
DataStax: Making Cassandra Fail (for effective testing) by
DataStax: Making Cassandra Fail (for effective testing)DataStax: Making Cassandra Fail (for effective testing)
DataStax: Making Cassandra Fail (for effective testing)DataStax Academy
2.1K views52 slides
Devoxx France: Fault tolerant microservices on the JVM with Cassandra by
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraChristopher Batey
2.7K views71 slides

More Related Content

What's hot

Real World Mocking In Swift by
Real World Mocking In SwiftReal World Mocking In Swift
Real World Mocking In SwiftVeronica Lillie
555 views76 slides
Meetup cassandra sfo_jdbc by
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbczznate
1.3K views27 slides
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E... by
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...Luciano Mammino
1.6K views76 slides
MongoDB: tips, trick and hacks by
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacksScott Hernandez
5.2K views17 slides
Hector v2: The Second Version of the Popular High-Level Java Client for Apach... by
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...zznate
1.5K views16 slides
Beyond Profilers: Tracing Node.js Transactions by
Beyond Profilers: Tracing Node.js TransactionsBeyond Profilers: Tracing Node.js Transactions
Beyond Profilers: Tracing Node.js TransactionsTerral R Jordan
792 views30 slides

What's hot(20)

Meetup cassandra sfo_jdbc by zznate
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
zznate1.3K views
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E... by Luciano Mammino
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...
Cracking JWT tokens: a tale of magic, Node.js and parallel computing - Code E...
Luciano Mammino1.6K views
MongoDB: tips, trick and hacks by Scott Hernandez
MongoDB: tips, trick and hacksMongoDB: tips, trick and hacks
MongoDB: tips, trick and hacks
Scott Hernandez5.2K views
Hector v2: The Second Version of the Popular High-Level Java Client for Apach... by zznate
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate1.5K views
Beyond Profilers: Tracing Node.js Transactions by Terral R Jordan
Beyond Profilers: Tracing Node.js TransactionsBeyond Profilers: Tracing Node.js Transactions
Beyond Profilers: Tracing Node.js Transactions
Terral R Jordan792 views
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016) by Dan Robinson
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Designing The Right Schema To Power Heap (PGConf Silicon Valley 2016)
Dan Robinson1.1K views
Adventures in Multithreaded Core Data by Inferis
Adventures in Multithreaded Core DataAdventures in Multithreaded Core Data
Adventures in Multithreaded Core Data
Inferis23.7K views
EPAM IT WEEK: AEM & TDD. It's so boring... by Andrew Manuev
EPAM IT WEEK: AEM & TDD. It's so boring...EPAM IT WEEK: AEM & TDD. It's so boring...
EPAM IT WEEK: AEM & TDD. It's so boring...
Andrew Manuev979 views
Javascript Everywhere by Pascal Rettig
Javascript EverywhereJavascript Everywhere
Javascript Everywhere
Pascal Rettig2.2K views
Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid... by Codemotion
Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid...Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid...
Matteo Collina | Take your HTTP server to Ludicrous Speed | Codmeotion Madrid...
Codemotion198 views
Drools, jBPM OptaPlanner presentation by Mark Proctor
Drools, jBPM OptaPlanner presentationDrools, jBPM OptaPlanner presentation
Drools, jBPM OptaPlanner presentation
Mark Proctor338 views
REST to GraphQL migration: Pros, cons and gotchas by Alexey Ivanov
REST to GraphQL migration: Pros, cons and gotchasREST to GraphQL migration: Pros, cons and gotchas
REST to GraphQL migration: Pros, cons and gotchas
Alexey Ivanov515 views
Anton Moldovan "Load testing which you always wanted" by Fwdays
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
Fwdays1.1K views
ES6, 잘 쓰고 계시죠? by 장현 한
ES6, 잘 쓰고 계시죠?ES6, 잘 쓰고 계시죠?
ES6, 잘 쓰고 계시죠?
장현 한2.8K views
GKAC 2015 Apr. - RxAndroid by GDG Korea
GKAC 2015 Apr. - RxAndroidGKAC 2015 Apr. - RxAndroid
GKAC 2015 Apr. - RxAndroid
GDG Korea2.1K views

Similar to LJC Conference 2014 Cassandra for Java Developers

LA Cassandra Day 2015 - Cassandra for developers by
LA Cassandra Day 2015  - Cassandra for developersLA Cassandra Day 2015  - Cassandra for developers
LA Cassandra Day 2015 - Cassandra for developersChristopher Batey
1.4K views53 slides
GraphQL - when REST API is not enough - lessons learned by
GraphQL - when REST API is not enough - lessons learnedGraphQL - when REST API is not enough - lessons learned
GraphQL - when REST API is not enough - lessons learnedMarcinStachniuk
511 views45 slides
Cassandra Day London: Building Java Applications by
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsChristopher Batey
1.1K views69 slides
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java by
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaCassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaDataStax Academy
1.3K views69 slides
[WSO2Con Asia 2018] Patterns for Building Streaming Apps by
[WSO2Con Asia 2018] Patterns for Building Streaming Apps[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming AppsWSO2
224 views54 slides
Couchbase@live person meetup july 22nd by
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22ndIdo Shilon
1.2K views22 slides

Similar to LJC Conference 2014 Cassandra for Java Developers(20)

LA Cassandra Day 2015 - Cassandra for developers by Christopher Batey
LA Cassandra Day 2015  - Cassandra for developersLA Cassandra Day 2015  - Cassandra for developers
LA Cassandra Day 2015 - Cassandra for developers
Christopher Batey1.4K views
GraphQL - when REST API is not enough - lessons learned by MarcinStachniuk
GraphQL - when REST API is not enough - lessons learnedGraphQL - when REST API is not enough - lessons learned
GraphQL - when REST API is not enough - lessons learned
MarcinStachniuk511 views
Cassandra Day London: Building Java Applications by Christopher Batey
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java Applications
Christopher Batey1.1K views
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java by DataStax Academy
Cassandra Day London 2015: Getting Started with Apache Cassandra and JavaCassandra Day London 2015: Getting Started with Apache Cassandra and Java
Cassandra Day London 2015: Getting Started with Apache Cassandra and Java
DataStax Academy1.3K views
[WSO2Con Asia 2018] Patterns for Building Streaming Apps by WSO2
[WSO2Con Asia 2018] Patterns for Building Streaming Apps[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
WSO2224 views
Couchbase@live person meetup july 22nd by Ido Shilon
Couchbase@live person meetup   july 22ndCouchbase@live person meetup   july 22nd
Couchbase@live person meetup july 22nd
Ido Shilon1.2K views
Saving Money by Optimizing Your Cloud Add-On Infrastructure by Atlassian
Saving Money by Optimizing Your Cloud Add-On InfrastructureSaving Money by Optimizing Your Cloud Add-On Infrastructure
Saving Money by Optimizing Your Cloud Add-On Infrastructure
Atlassian4.8K views
GraphQL - an elegant weapon... for more civilized age by Bartosz Sypytkowski
GraphQL - an elegant weapon... for more civilized ageGraphQL - an elegant weapon... for more civilized age
GraphQL - an elegant weapon... for more civilized age
Streams, Tables, and Time in KSQL by confluent
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
confluent1.3K views
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra... by Noriaki Tatsumi
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
Noriaki Tatsumi539 views
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise by WSO2
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
WSO2466 views
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015) by Dan Robinson
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Powering Heap With PostgreSQL And CitusDB (PGConf Silicon Valley 2015)
Dan Robinson1.8K views
Intravert Server side processing for Cassandra by Edward Capriolo
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
Edward Capriolo5.2K views
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices" by DataStax Academy
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
DataStax Academy3.9K views
GraphQL - when REST API is to less - lessons learned by MarcinStachniuk
GraphQL - when REST API is to less - lessons learnedGraphQL - when REST API is to less - lessons learned
GraphQL - when REST API is to less - lessons learned
MarcinStachniuk380 views
GraphQL - when REST API is to less - lessons learned by MarcinStachniuk
GraphQL - when REST API is to less - lessons learnedGraphQL - when REST API is to less - lessons learned
GraphQL - when REST API is to less - lessons learned
MarcinStachniuk299 views
[WSO2Con USA 2018] Patterns for Building Streaming Apps by WSO2
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
WSO2240 views

More from Christopher Batey

Cassandra summit LWTs by
Cassandra summit  LWTsCassandra summit  LWTs
Cassandra summit LWTsChristopher Batey
503 views58 slides
Docker and jvm. A good idea? by
Docker and jvm. A good idea?Docker and jvm. A good idea?
Docker and jvm. A good idea?Christopher Batey
3.2K views65 slides
LJC: Microservices in the real world by
LJC: Microservices in the real worldLJC: Microservices in the real world
LJC: Microservices in the real worldChristopher Batey
1.4K views61 slides
NYC Cassandra Day - Java Intro by
NYC Cassandra Day - Java IntroNYC Cassandra Day - Java Intro
NYC Cassandra Day - Java IntroChristopher Batey
1.2K views28 slides
Cassandra Day NYC - Cassandra anti patterns by
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patternsChristopher Batey
636 views61 slides
Think your software is fault-tolerant? Prove it! by
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!Christopher Batey
1.1K views37 slides

More from Christopher Batey(20)

Cassandra Day NYC - Cassandra anti patterns by Christopher Batey
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patterns
Christopher Batey636 views
Think your software is fault-tolerant? Prove it! by Christopher Batey
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!
Christopher Batey1.1K views
Manchester Hadoop Meetup: Cassandra Spark internals by Christopher Batey
Manchester Hadoop Meetup: Cassandra Spark internalsManchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internals
Christopher Batey608 views
3 Dundee-Spark Overview for C* developers by Christopher Batey
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
Christopher Batey450 views
Data Science Lab Meetup: Cassandra and Spark by Christopher Batey
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
Manchester Hadoop Meetup: Spark Cassandra Integration by Christopher Batey
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
Christopher Batey1.6K views
Manchester Hadoop User Group: Cassandra Intro by Christopher Batey
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra Intro
Christopher Batey791 views
Munich March 2015 - Cassandra + Spark Overview by Christopher Batey
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark Overview
Christopher Batey708 views

LJC Conference 2014 Cassandra for Java Developers

  • 1. Building awesome applications with Apache Cassandra Christopher Batey @chbatey ©2013 DataStax Confidential. Do not distribute without consent. 1
  • 2. Who am I? •Technical Evangelist for Apache Cassandra • Founder of Stubbed Cassandra • Help out Apache Cassandra users • Previous: Cassandra backed apps at BSkyB @chbatey
  • 3. @chbatey Overview • Topics covered • Cassandra overview • Customer events example • DataStax Java Driver • Java Mapping API • Other features • Light weight transactions • Load balancing • Reconnection policies
  • 4. @chbatey Overview • Topics covered • Cassandra overview • Customer events example • DataStax Java Driver • Java Mapping API • Other features • Light weight transactions • Load balancing • Reconnection policies • Not covered • Cassandra read and write paths • Cassandra failure nodes
  • 5. Common use cases •Ordered data such as time series •Event stores •Financial transactions •Sensor data e.g IoT @chbatey
  • 6. Common use cases •Ordered data such as time series •Event stores •Financial transactions •Sensor data e.g IoT •Non functional requirements: • Linear scalability • High throughout durable writes •Multi datacenter including active-active •Analytics without ETL @chbatey
  • 8. Cassandra Cassandra • Distributed master less database (Dynamo) • Column family data model (Google BigTable)
  • 9. Cassandra Europe • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start USA
  • 10. Cassandra Online • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start • Analytics with Apache Spark Analytics
  • 11. Replication WRITE CL = 1 We have replication! DC1 DC2 client C RC RF3 RF3
  • 12. Tunable Consistency • Data is replicated N times • Every query that you execute you give a consistency • ALL • QUORUM • LOCAL_QUORUM • ONE • Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://youtu.be/ A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU @chbatey
  • 13. CQL •Cassandra Query Language •SQL like query language •Keyspace – analogous to a schema • The keyspace determines the RF (replication factor) •Table – looks like a SQL Table CREATE TABLE scores ( @chbatey name text, score int, date timestamp, PRIMARY KEY (name, score) ); INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24'); INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25'); SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
  • 14. Example Time: Customer event store @chbatey
  • 15. An example: Customer event store • Customer event • customer_id - ChrisBatey • staff_id - Charlie • store_type Website, PhoneApp, Phone, Retail • event_type - login, logout, add_to_basket, remove_from_basket, buy_item • time • tags
  • 16. Requirements • Get all events • Get all events for a particular customer • As above for a time slice
  • 17. Modelling in Cassandra CREATE TABLE customer_events( customer_id text, staff_id text, Partition Key time timeuuid, store_type text, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time)); Clustering Column(s)
  • 18. How it is stored on disk customer _id time event_type store_type tags charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'} charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'} charles 2014-11-18 16:53:09 logout online {} chbatey 2014-11-18 16:52:21 login online {} chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'} chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'} charles event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item wine event_type logout staff_id n/a store_type online chbatey event_type login staff_id n/a store_type online event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item cheese
  • 19. DataStax Java Driver • Open source @chbatey
  • 20. @chbatey Get all the events public List<CustomerEvent> getAllCustomerEvents() { return session.execute("select * from customers.customer_events") .all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); } private Function<Row, CustomerEvent> mapCustomerEvent() { return row -> new CustomerEvent( row.getString("customer_id"), row.getUUID("time"), row.getString("staff_id"), row.getString("store_type"), row.getString("event_type"), row.getMap("tags", String.class, String.class)); }
  • 21. All events for a particular customer private PreparedStatement getEventsForCustomer; @PostConstruct public void prepareSatements() { getEventsForCustomer = session.prepare("select * from customers.customer_events where customer_id = ?"); } public List<CustomerEvent> getCustomerEvents(String customerId) { BoundStatement boundStatement = getEventsForCustomer.bind(customerId); return session.execute(boundStatement) .all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); @chbatey }
  • 22. Customer events for a time slice public List<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime) { Select.Where getCustomers = QueryBuilder.select() .all() .from("customers", "customer_events") .where(eq("customer_id", customerId)) .and(gt("time", UUIDs.startOf(startTime))) .and(lt("time", UUIDs.endOf(endTime))); return session.execute(getCustomers).all().stream() .map(mapCustomerEvent()) .collect(Collectors.toList()); @chbatey }
  • 23. @chbatey Mapping API @Table(keyspace = "customers", name = "customer_events") public class CustomerEvent { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn private UUID time; @Column(name = "staff_id") private String staffId; @Column(name = "store_type") private String storeType; @Column(name = "event_type") private String eventType; private Map<String, String> tags; // ctr / getters etc }
  • 24. @chbatey Mapping API @Accessor public interface CustomerEventDao { @Query("select * from customers.customer_events where customer_id = :customerId") Result<CustomerEvent> getCustomerEvents(String customerId); @Query("select * from customers.customer_events") Result<CustomerEvent> getAllCustomerEvents(); @Query("select * from customers.customer_events where customer_id = :customerId and time > minTimeuuid(:startTime) and time < maxTimeuuid(:endTime)") Result<CustomerEvent> getCustomerEventsForTime(String customerId, long startTime, long endTime); } @Bean public CustomerEventDao customerEventDao() { MappingManager mappingManager = new MappingManager(session); return mappingManager.createAccessor(CustomerEventDao.class); }
  • 25. Adding some type safety public enum StoreType { ONLINE, RETAIL, FRANCHISE, MOBILE @chbatey } @Table(keyspace = "customers", name = "customer_events") public class CustomerEvent { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Column(name = "store_type") @Enumerated(EnumType.STRING) // could be EnumType.ORDINAL private StoreType storeType;
  • 26. @chbatey User defined types create TYPE store (name text, type text, postcode text) ; CREATE TABLE customer_events_type( customer_id text, staff_id text, time timeuuid, store frozen<store>, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time));
  • 27. Mapping user defined types @chbatey @UDT(keyspace = "customers", name = "store") public class Store { private String name; private StoreType type; private String postcode; // getters etc } @Table(keyspace = "customers", name = "customer_events_type") public class CustomerEventType { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Frozen private Store store; @Column(name = "event_type") private String eventType; private Map<String, String> tags;
  • 28. Mapping user defined types @chbatey @UDT(keyspace = "customers", name = "store") public class Store { private String name; private StoreType type; private String postcode; // getters etc } @Table(keyspace = "customers", name = "customer_events_type") public class CustomerEventType { @PartitionKey @Column(name = "customer_id") private String customerId; @ClusteringColumn() private UUID time; @Column(name = "staff_id") private String staffId; @Frozen private Store store; @Column(name = "event_type") private String eventType; private Map<String, String> tags; @Query("select * from customers.customer_events_type") Result<CustomerEventType> getAllCustomerEventsWithStoreType();
  • 29. What else can I do? @chbatey
  • 30. Lightweight Transactions (LWT) Consequences of Lightweight Transactions 4 round trips vs. 1 for normal updates (uses Paxos algorithm) Operations are done on a per-partition basis Will be going across data centres to obtain consensus (unless you use LOCAL_SERIAL consistency) Cassandra user will need read and write access i.e. you get back the row! Great for 1% your app, but eventual consistency is still your friend! @chbatey
  • 31. Company Confidential @chbatey Batch Statements BEGIN BATCH INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE userID = 'user2’ APPLY BATCH; BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical operation Atomic operation If any statement in the batch succeeds, all will No batch isolation Other “transactions” can read and write data being affected by a partially executed batch © 2014 DataStax, All Rights Reserved.
  • 32. Batch Statements with LWT BEGIN BATCH UPDATE foo SET z = 1 WHERE x = 'a' AND y = 1; UPDATE foo SET z = 2 WHERE x = 'a' AND y = 2 IF t = 4; Company Confidential @chbatey APPLY BATCH; Allows you to group multiple conditional updates in a batch as long as all those updates apply to the same partition © 2014 DataStax, All Rights Reserved.
  • 33. Load balancing • Data centre aware policy • Token aware policy • Latency aware policy • Whitelist policy APP APP DC1 DC2 @chbatey Async Replication
  • 34. Load balancing • Data centre aware policy • Token aware policy • Latency aware policy • Whitelist policy APP APP DC1 DC2 @chbatey Async Replication
  • 35. Reconnection Policies • Policy that decides how often the reconnection to a dead node is attempted. Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy()) .build(); • ConstantReconnectionPolicy • ExponentialReconnectionPolicy (Default) @chbatey ©2014 DataStax. Do not distribute without consent.
  • 36. Reconnection Policies • Policy that decides how often the reconnection to a dead node is attempted. Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .withReconnectionPolicy(new ConstantReconnectionPolicy(1000)) .withLoadBalancingPolicy(new TokenAwarePolicy()) .build(); • ConstantReconnectionPolicy • ExponentialReconnectionPolicy (Default) @chbatey ©2014 DataStax. Do not distribute without consent.
  • 37. @chbatey Summary • Cassandra overview • Customer events example • DataStax Java Driver • Java Mapping API • Other features • Light weight transactions • Load balancing • Reconnection policies
  • 38. Thanks for listening • Badger me on twitter @chbatey • https://github.com/chbatey/cassandra-customer-events • https://academy.datastax.com/ • http://christopher-batey.blogspot.co.uk/ @chbatey
  • 39. © 2014 DataStax, All Rights Reserved. Company Confidential Training Day | December 3rd Beginner Track • Introduction to Cassandra • Introduction to Spark, Shark, Scala and Cassandra Advanced Track • Data Modeling • Performance Tuning Conference Day | December 4th Cassandra Summit Europe 2014 will be the single largest gathering of Cassandra users in Europe. Learn how the world's most successful companies are transforming their businesses and growing faster than ever using Apache Cassandra. http://bit.ly/cassandrasummit2014 39