SlideShare a Scribd company logo
1 of 58
Download to read offline
@chbatey
Christopher Batey

Technical Evangelist for Apache Cassandra
Cassandra deep dive
@chbatey
Who am I?
• Technical Evangelist for Apache Cassandra
•Founder of Stubbed Cassandra
•Help out Apache Cassandra users
• DataStax
•Builds enterprise ready version of Apache
Cassandra
• Previous: Cassandra backed apps at BSkyB
@chbatey
Overview
• Cassandra use cases
• Replication
• Fault tolerance
• Read and write path
• Data modelling
• Java Driver
@chbatey
Distributed databases
@chbatey
It is a big world
• Relational
- Oracle, PostgreSQL
• Graph databases
- Neo4J, InfoGrid, Titan
• Key value
- DynamoDB
• Document stores
- MongoDB, Couchbase
• Columnar aka wide row
- Cassandra, HBase
@chbatey
Building a web app
@chbatey
Running multiple copies of your app
@chbatey
Still in one DC?
@chbatey
Handling hardware failure
@chbatey
Handling hardware failure
@chbatey
Master/slave
•Master serves all writes
•Read from master and optionally slaves
@chbatey
Peer-to-Peer
• No master
• Read/write to any
• Consistency?
@chbatey
Decisions decisions… CAP theorem
Are these really that different??
Mongo, Redis, Couchbase
Highly Available Databases:
Voldermort, Cassandra
@chbatey
Cassandra use cases
@chbatey
Cassandra for Applications
APACHE
CASSANDRA
@chbatey
Common use cases
•Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
@chbatey
Common use cases
•Ordered data such as time series
-Event stores
-Financial transactions
-Sensor data e.g IoT
•Non functional requirements:
-Linear scalability
-High throughout durable writes
-Multi datacenter including active-active
-Analytics without ETL
@chbatey
Cassandra deep dive
@chbatey
Cassandra
Cassandra
• Distributed masterless
database (Dynamo)
• Column family data model
(Google BigTable)
@chbatey
Datacenter and rack aware
Europe
• Distributed master less
database (Dynamo)
• Column family data model
(Google BigTable)
• Multi data centre replication
built in from the start
USA
@chbatey
Cassandra
Online
• Distributed master less
database (Dynamo)
• Column family data model
(Google BigTable)
• Multi data centre replication
built in from the start
• Analytics with Apache SparkAnalytics
@chbatey
Dynamo 101
@chbatey
Dynamo 101
• The parts Cassandra took
- Consistent hashing
- Replication
- Gossip
- Hinted handoff
- Anti-entropy repair
• And the parts it left behind
- Key/Value
- Vector clocks
@chbatey
Picking the right nodes
• You don’t want a full table scan on a 1000 node cluster!
• Dynamo to the rescue: Consistent Hashing
• Then the replication strategy takes over:
- Network topology
- Simple
@chbatey
Murmer3 Example
• Data:
• Murmer3 Hash Values:
jim age: 36 car: ford gender: M
carol age: 37 car: bmw gender: F
johnny age: 12 gender: M
suzy: age: 10 gender: F
Primary Key Murmur3 hash value
jim 350
carol 998
johnny 50
suzy 600
Primary Key
Real hash range: -9223372036854775808 to 9223372036854775807
@chbatey
Murmer3 Example
Four node cluster:
Node Murmur3 start range Murmur3 end range
A 0 249
B 250 499
C 500 749
D 750 999
@chbatey
Pictures are better
A
B
C
D
999
249
499
750
749
0
250
500
B
CD
A
@chbatey
Murmer3 Example
Data is distributed as:
Node Start range End range Primary
key
Hash value
A 0 249 johnny 50
B 250 499 jim 350
C 500 749 suzy 600
D 750 999 carol 998
@chbatey
Replication
@chbatey
Replication strategy
• Simple
- Give it to the next node in the ring
- Don’t use this in production
• NetworkTopology
- Every Cassandra node knows its DC and Rack
- Replicas won’t be put on the same rack unless Replication Factor > # of racks
- Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
@chbatey
Replication
DC1 DC2
client
RF3 RF3
C
RC
WRITE
CL = 1 We have replication!
32
@chbatey
Tunable Consistency
•Data is replicated N times
•Every query that you execute you give a consistency
-ALL
-QUORUM
-LOCAL_QUORUM
-ONE
• Christos Kalantzis Eventual Consistency != Hopeful Consistency: http://
youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
@chbatey
Load balancing
•Data centre aware policy
•Token aware policy
•Latency aware policy
•Whitelist policy
APP APP
Async
Replication
DC1 DC2
@chbatey
Scaling shouldn’t be hard
• Throw more nodes at a cluster
• Bootstrapping + joining the ring
• For large data sets this can take some time
@chbatey
Data modelling
@chbatey
You must denormalise
@chbatey
Cassandra can not join or aggregate
Client
Where do I go for the max?
@chbatey
CQL
•Cassandra Query Language
-SQL like query language
•Keyspace – analogous to a schema
- The keyspace determines the RF (replication factor)
•Table – looks like a SQL Table CREATE TABLE scores (
name text,
score int,
date timestamp,
PRIMARY KEY (name, score)
);
INSERT INTO scores (name, score, date)
VALUES ('bob', 42, '2012-06-24');
INSERT INTO scores (name, score, date)
VALUES ('bob', 47, '2012-06-25');
SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
@chbatey
Lots of types
@chbatey
UUID
• Universal Unique ID
- 128 bit number represented in character form e.g.
99051fe9-6a9c-46c2-b949-38ef78858dd0
• Easily generated on the client
- Version 1 has a timestamp component (TIMEUUID)
- Version 4 has no timestamp component
@chbatey
Company Confidential
TIMEUUID
TIMEUUID data type supports Version 1 UUIDs
Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 bits)
– CQL function ‘now()’ generates a new TIMEUUID
Time can be extracted from TIMEUUID
– CQL function dateOf() extracts the timestamp as a date
TIMEUUID values in clustering columns or in column names are ordered based on time
– DESC order on TIMEUUID lists most recent data first
© 2014 DataStax, All Rights Reserved.
@chbatey
Collections
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
@chbatey
Data Model - User Defined Types
• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting!
CREATE TYPE address (
street text,
city text,
zip_code int,
country text,
cross_streets set<text>
);
Data Model - Updated
• We can embed video_metadata in videos
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Data Model - Storing JSON
{
"productId": 2,
"name": "Kitchen Table",
"price": 249.99,
"description" : "Rectangular table with oak finish",
"dimensions": {
"units": "inches",
"length": 50.0,
"width": 66.0,
"height": 32
},
"categories": {
{
"category" : "Home Furnishings" {
"catalogPage": 45,
"url": "/home/furnishings"
},
{
"category" : "Kitchen Furnishings" {
"catalogPage": 108,
"url": "/kitchen/furnishings"
}
}
}
CREATE TYPE dimensions (
units text,
length float,
width float,
height float
);
CREATE TYPE category (
catalogPage int,
url text
);
CREATE TABLE product (
productId int,
name text,
price float,
description text,
dimensions frozen <dimensions>,
categories map <text, frozen <category>>,
PRIMARY KEY (productId)
);
@chbatey
Tuple type
• Type to represent a
group
• Up to 256 different
elements
CREATE TABLE tuple_table (
id int PRIMARY KEY,
three_tuple frozen <tuple<int, text, float>>,
four_tuple frozen <tuple<int, text, float, inet>>,
five_tuple frozen <tuple<int, text, float, inet, ascii>>
);
@chbatey
Counters
• Old has been around since .8
• Commit log replay changes counters
• Repair can change a counter
@chbatey
Time-to-Live (TTL)
TTL a row:
INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’)
USING TTL 3600; // Expires data in one hour

TTL a column:
UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’
– TTL in seconds
– Can also set default TTL at a table level
– Expired columns/values automatically deleted
– With no TTL specified, columns/values never expire
– TTL is useful for automatic deletion
– Re-inserting the same row before it expires will overwrite TTL
@chbatey
DevCenter
@chbatey
Example Time: Customer event store
@chbatey
An example: Customer event store
• Customer event
- customer_id e.g ChrisBatey
- event_type e.g login, logout, add_to_basket,
remove_from_basket, buy_item
• Staff
- name e.g Charlie
- favourite_colour e.g red
• Store
- name
- type e.g Website, PhoneApp, Phone, Retail
@chbatey
Requirements
• Get all events
• Get all events for a particular customer
• As above for a time slice
@chbatey
Modelling in a relational database
CREATE TABLE customer_events(
customer_id text,
staff_name text,
time timeuuid,
event_type text,
store_name text,
PRIMARY KEY (customer_id));
CREATE TABLE store(
name text,
location text,
store_type text,
PRIMARY KEY (store_name));
CREATE TABLE staff(
name text,
favourite_colour text,
job_title text,
PRIMARY KEY (name));
@chbatey
Your model should look like your queries
Modelling in Cassandra
CREATE TABLE customer_events(
customer_id text,
staff_id text,
time timeuuid,
store_type text,
event_type text,
tags map<text, text>,
PRIMARY KEY ((customer_id), time));
Partition Key
Clustering Column(s)
How it is stored on disk
customer
_id
time event_type store_type tags
charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'}
charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'}
charles 2014-11-18 16:53:09 logout online {}
chbatey 2014-11-18 16:52:21 login online {}
chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'}
chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'}
charles
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
coffee
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
wine
event_type
logout
staff_id
n/a
store_type
online
chbatey
event_type
login
staff_id
n/a
store_type
online
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
coffee
event_type
basket_add
staff_id
n/a
store_type
online
tags:item
cheese
@chbatey
Spark Time

More Related Content

Viewers also liked

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Cassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsChristopher Batey
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developersChristopher Batey
 
Dublin Meetup: Cassandra anti patterns
Dublin Meetup: Cassandra anti patternsDublin Meetup: Cassandra anti patterns
Dublin Meetup: Cassandra anti patternsChristopher Batey
 
Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!Christopher Batey
 
Cassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patternsChristopher Batey
 
Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Christopher Batey
 
Cassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorCassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorChristopher Batey
 
NYC Cassandra Day - Java Intro
NYC Cassandra Day - Java IntroNYC Cassandra Day - Java Intro
NYC Cassandra Day - Java IntroChristopher Batey
 
Manchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internalsManchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internalsChristopher Batey
 
LJC: Microservices in the real world
LJC: Microservices in the real worldLJC: Microservices in the real world
LJC: Microservices in the real worldChristopher Batey
 
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationChristopher Batey
 
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraChristopher Batey
 
Docker and jvm. A good idea?
Docker and jvm. A good idea?Docker and jvm. A good idea?
Docker and jvm. A good idea?Christopher Batey
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 

Viewers also liked (20)

Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java Applications
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
IoT London July 2015
IoT London July 2015IoT London July 2015
IoT London July 2015
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Dublin Meetup: Cassandra anti patterns
Dublin Meetup: Cassandra anti patternsDublin Meetup: Cassandra anti patterns
Dublin Meetup: Cassandra anti patterns
 
Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!
 
Cassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patterns
 
Cassandra summit LWTs
Cassandra summit  LWTsCassandra summit  LWTs
Cassandra summit LWTs
 
Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0
 
Cassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorCassandra London - C* Spark Connector
Cassandra London - C* Spark Connector
 
NYC Cassandra Day - Java Intro
NYC Cassandra Day - Java IntroNYC Cassandra Day - Java Intro
NYC Cassandra Day - Java Intro
 
Manchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internalsManchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internals
 
LJC: Microservices in the real world
LJC: Microservices in the real worldLJC: Microservices in the real world
LJC: Microservices in the real world
 
2 Dundee - Cassandra-3
2 Dundee - Cassandra-32 Dundee - Cassandra-3
2 Dundee - Cassandra-3
 
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
 
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
 
Docker and jvm. A good idea?
Docker and jvm. A good idea?Docker and jvm. A good idea?
Docker and jvm. A good idea?
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 

Similar to Manchester Hadoop User Group: Cassandra Intro

Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!Christopher Batey
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupChristopher Batey
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkChristopher Batey
 
LJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraChristopher Batey
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraLuke Tillman
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Paris Day Cassandra: Use case
Paris Day Cassandra: Use caseParis Day Cassandra: Use case
Paris Day Cassandra: Use caseChristopher Batey
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax Academy
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesEmbarcadero Technologies
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...DataStax Academy
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's Newdpcobb
 

Similar to Manchester Hadoop User Group: Cassandra Intro (20)

Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!Vienna Feb 2015: Cassandra: How it works and what it's good for!
Vienna Feb 2015: Cassandra: How it works and what it's good for!
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
LJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache Cassandra
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Paris Day Cassandra: Use case
Paris Day Cassandra: Use caseParis Day Cassandra: Use case
Paris Day Cassandra: Use case
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Dan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New FeaturesDan Hotka's Top 10 Oracle 12c New Features
Dan Hotka's Top 10 Oracle 12c New Features
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Sql 2016 - What's New
Sql 2016 - What's NewSql 2016 - What's New
Sql 2016 - What's New
 

More from Christopher Batey

Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsChristopher Batey
 
Munich March 2015 - Cassandra + Spark Overview
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark OverviewChristopher Batey
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkChristopher Batey
 
LA Cassandra Day 2015 - Testing Cassandra
LA Cassandra Day 2015  - Testing CassandraLA Cassandra Day 2015  - Testing Cassandra
LA Cassandra Day 2015 - Testing CassandraChristopher Batey
 
LA Cassandra Day 2015 - Cassandra for developers
LA Cassandra Day 2015  - Cassandra for developersLA Cassandra Day 2015  - Cassandra for developers
LA Cassandra Day 2015 - Cassandra for developersChristopher Batey
 
Voxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesChristopher Batey
 
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Christopher Batey
 
Cassandra Summit EU 2014 - Testing Cassandra Applications
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsCassandra Summit EU 2014 - Testing Cassandra Applications
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsChristopher Batey
 

More from Christopher Batey (8)

Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-Patterns
 
Munich March 2015 - Cassandra + Spark Overview
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark Overview
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache SparkReading Cassandra Meetup Feb 2015: Apache Spark
Reading Cassandra Meetup Feb 2015: Apache Spark
 
LA Cassandra Day 2015 - Testing Cassandra
LA Cassandra Day 2015  - Testing CassandraLA Cassandra Day 2015  - Testing Cassandra
LA Cassandra Day 2015 - Testing Cassandra
 
LA Cassandra Day 2015 - Cassandra for developers
LA Cassandra Day 2015  - Cassandra for developersLA Cassandra Day 2015  - Cassandra for developers
LA Cassandra Day 2015 - Cassandra for developers
 
Voxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservices
 
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
Cassandra Summit EU 2014 Lightning talk - Paging (no animation)
 
Cassandra Summit EU 2014 - Testing Cassandra Applications
Cassandra Summit EU 2014 - Testing Cassandra ApplicationsCassandra Summit EU 2014 - Testing Cassandra Applications
Cassandra Summit EU 2014 - Testing Cassandra Applications
 

Recently uploaded

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Recently uploaded (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Manchester Hadoop User Group: Cassandra Intro

  • 1. @chbatey Christopher Batey
 Technical Evangelist for Apache Cassandra Cassandra deep dive
  • 2. @chbatey Who am I? • Technical Evangelist for Apache Cassandra •Founder of Stubbed Cassandra •Help out Apache Cassandra users • DataStax •Builds enterprise ready version of Apache Cassandra • Previous: Cassandra backed apps at BSkyB
  • 3. @chbatey Overview • Cassandra use cases • Replication • Fault tolerance • Read and write path • Data modelling • Java Driver
  • 5. @chbatey It is a big world • Relational - Oracle, PostgreSQL • Graph databases - Neo4J, InfoGrid, Titan • Key value - DynamoDB • Document stores - MongoDB, Couchbase • Columnar aka wide row - Cassandra, HBase
  • 11. @chbatey Master/slave •Master serves all writes •Read from master and optionally slaves
  • 12. @chbatey Peer-to-Peer • No master • Read/write to any • Consistency?
  • 13. @chbatey Decisions decisions… CAP theorem Are these really that different?? Mongo, Redis, Couchbase Highly Available Databases: Voldermort, Cassandra
  • 16. @chbatey Common use cases •Ordered data such as time series -Event stores -Financial transactions -Sensor data e.g IoT
  • 17. @chbatey Common use cases •Ordered data such as time series -Event stores -Financial transactions -Sensor data e.g IoT •Non functional requirements: -Linear scalability -High throughout durable writes -Multi datacenter including active-active -Analytics without ETL
  • 19. @chbatey Cassandra Cassandra • Distributed masterless database (Dynamo) • Column family data model (Google BigTable)
  • 20. @chbatey Datacenter and rack aware Europe • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start USA
  • 21. @chbatey Cassandra Online • Distributed master less database (Dynamo) • Column family data model (Google BigTable) • Multi data centre replication built in from the start • Analytics with Apache SparkAnalytics
  • 23. @chbatey Dynamo 101 • The parts Cassandra took - Consistent hashing - Replication - Gossip - Hinted handoff - Anti-entropy repair • And the parts it left behind - Key/Value - Vector clocks
  • 24. @chbatey Picking the right nodes • You don’t want a full table scan on a 1000 node cluster! • Dynamo to the rescue: Consistent Hashing • Then the replication strategy takes over: - Network topology - Simple
  • 25. @chbatey Murmer3 Example • Data: • Murmer3 Hash Values: jim age: 36 car: ford gender: M carol age: 37 car: bmw gender: F johnny age: 12 gender: M suzy: age: 10 gender: F Primary Key Murmur3 hash value jim 350 carol 998 johnny 50 suzy 600 Primary Key Real hash range: -9223372036854775808 to 9223372036854775807
  • 26. @chbatey Murmer3 Example Four node cluster: Node Murmur3 start range Murmur3 end range A 0 249 B 250 499 C 500 749 D 750 999
  • 28. @chbatey Murmer3 Example Data is distributed as: Node Start range End range Primary key Hash value A 0 249 johnny 50 B 250 499 jim 350 C 500 749 suzy 600 D 750 999 carol 998
  • 30. @chbatey Replication strategy • Simple - Give it to the next node in the ring - Don’t use this in production • NetworkTopology - Every Cassandra node knows its DC and Rack - Replicas won’t be put on the same rack unless Replication Factor > # of racks - Unfortunately Cassandra can’t create servers and racks on the fly to fix this :(
  • 32. 32
  • 33. @chbatey Tunable Consistency •Data is replicated N times •Every query that you execute you give a consistency -ALL -QUORUM -LOCAL_QUORUM -ONE • Christos Kalantzis Eventual Consistency != Hopeful Consistency: http:// youtu.be/A6qzx_HE3EU?list=PLqcm6qE9lgKJzVvwHprow9h7KMpb5hcUU
  • 34. @chbatey Load balancing •Data centre aware policy •Token aware policy •Latency aware policy •Whitelist policy APP APP Async Replication DC1 DC2
  • 35. @chbatey Scaling shouldn’t be hard • Throw more nodes at a cluster • Bootstrapping + joining the ring • For large data sets this can take some time
  • 38. @chbatey Cassandra can not join or aggregate Client Where do I go for the max?
  • 39. @chbatey CQL •Cassandra Query Language -SQL like query language •Keyspace – analogous to a schema - The keyspace determines the RF (replication factor) •Table – looks like a SQL Table CREATE TABLE scores ( name text, score int, date timestamp, PRIMARY KEY (name, score) ); INSERT INTO scores (name, score, date) VALUES ('bob', 42, '2012-06-24'); INSERT INTO scores (name, score, date) VALUES ('bob', 47, '2012-06-25'); SELECT date, score FROM scores WHERE name='bob' AND score >= 40;
  • 41. @chbatey UUID • Universal Unique ID - 128 bit number represented in character form e.g. 99051fe9-6a9c-46c2-b949-38ef78858dd0 • Easily generated on the client - Version 1 has a timestamp component (TIMEUUID) - Version 4 has no timestamp component
  • 42. @chbatey Company Confidential TIMEUUID TIMEUUID data type supports Version 1 UUIDs Generated using time (60 bits), a clock sequence number (14 bits), and MAC address (48 bits) – CQL function ‘now()’ generates a new TIMEUUID Time can be extracted from TIMEUUID – CQL function dateOf() extracts the timestamp as a date TIMEUUID values in clustering columns or in column names are ordered based on time – DESC order on TIMEUUID lists most recent data first © 2014 DataStax, All Rights Reserved.
  • 43. @chbatey Collections CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
  • 44. @chbatey Data Model - User Defined Types • Complex data in one place • No multi-gets (multi-partitions) • Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
  • 45. Data Model - Updated • We can embed video_metadata in videos CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text ); CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
  • 46. Data Model - Storing JSON { "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } } CREATE TYPE dimensions ( units text, length float, width float, height float ); CREATE TYPE category ( catalogPage int, url text ); CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
  • 47. @chbatey Tuple type • Type to represent a group • Up to 256 different elements CREATE TABLE tuple_table ( id int PRIMARY KEY, three_tuple frozen <tuple<int, text, float>>, four_tuple frozen <tuple<int, text, float, inet>>, five_tuple frozen <tuple<int, text, float, inet, ascii>> );
  • 48. @chbatey Counters • Old has been around since .8 • Commit log replay changes counters • Repair can change a counter
  • 49. @chbatey Time-to-Live (TTL) TTL a row: INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘catherine’, ‘cachart’) USING TTL 3600; // Expires data in one hour
 TTL a column: UPDATE users USING TTL 30 SET last = ‘miller’ WHERE id = ‘abc123’ – TTL in seconds – Can also set default TTL at a table level – Expired columns/values automatically deleted – With no TTL specified, columns/values never expire – TTL is useful for automatic deletion – Re-inserting the same row before it expires will overwrite TTL
  • 52. @chbatey An example: Customer event store • Customer event - customer_id e.g ChrisBatey - event_type e.g login, logout, add_to_basket, remove_from_basket, buy_item • Staff - name e.g Charlie - favourite_colour e.g red • Store - name - type e.g Website, PhoneApp, Phone, Retail
  • 53. @chbatey Requirements • Get all events • Get all events for a particular customer • As above for a time slice
  • 54. @chbatey Modelling in a relational database CREATE TABLE customer_events( customer_id text, staff_name text, time timeuuid, event_type text, store_name text, PRIMARY KEY (customer_id)); CREATE TABLE store( name text, location text, store_type text, PRIMARY KEY (store_name)); CREATE TABLE staff( name text, favourite_colour text, job_title text, PRIMARY KEY (name));
  • 55. @chbatey Your model should look like your queries
  • 56. Modelling in Cassandra CREATE TABLE customer_events( customer_id text, staff_id text, time timeuuid, store_type text, event_type text, tags map<text, text>, PRIMARY KEY ((customer_id), time)); Partition Key Clustering Column(s)
  • 57. How it is stored on disk customer _id time event_type store_type tags charles 2014-11-18 16:52:04 basket_add online {'item': 'coffee'} charles 2014-11-18 16:53:00 basket_add online {'item': ‘wine'} charles 2014-11-18 16:53:09 logout online {} chbatey 2014-11-18 16:52:21 login online {} chbatey 2014-11-18 16:53:21 basket_add online {'item': 'coffee'} chbatey 2014-11-18 16:54:00 basket_add online {'item': 'cheese'} charles event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item wine event_type logout staff_id n/a store_type online chbatey event_type login staff_id n/a store_type online event_type basket_add staff_id n/a store_type online tags:item coffee event_type basket_add staff_id n/a store_type online tags:item cheese