©2013 DataStax Confidential. Do not distribute without consent.
@chbatey
Christopher Batey

Cassandra 2.2 and 3.0
@chbatey
First comes a blog
• Each new feature has a vastly more detailed blog post:
http://christopher-batey.blogspot.co.uk/
@chbatey
Were did 2.2 come from?
@chbatey
New features
• 2.2
- JSON
- User defined functions
- User defined aggregates
- Role based authentication
- The small print
• 3.0
- New storage engine
- Materialised views
@chbatey
Hello JSON
• create TABLE user (username text primary key,
first_name text , last_name text , emails set<text> ,
country text);
• INSERT INTO user JSON '{"username": "chbatey",
"first_name":"Christopher", "last_name": "Batey",
“emails":["christopher.batey@datastax.com"]}';
@chbatey
Goodbye JSON
@chbatey
JSON + User Defined Types
• CREATE TYPE movie (title text, time timestamp,
description text);
• ALTER TABLE user ADD movies set<frozen<movie>>;
• UPDATE user SET movies = { { title:'Batman',
time:'2011-02-03T04:05:00+0000', description: 'This film
rocks' } } where username = 'chbatey';
@chbatey
Out it comes
@chbatey
Cassandra HTTP Wrapper?
@RequestMapping(method = {RequestMethod.POST}, value = "/{keyspace}/{table}", consumes = "application/json")

public ResponseEntity<String> store(@PathVariable String keyspace, @PathVariable String table, @RequestBody String body) {

session.execute(String.format("insert into %s.%s JSON '%s'", keyspace, table, body));

return ResponseEntity.ok("OK");

}
Keyspace
Table
Raw JSON
curl --header "Content-Type: application/json" -X POST -v "localhost:8080/twotwo/
user" --data '{"username": "trev2", "country": null, "emails": ["trevor@gmail.com",
"trevor@yahoo.com"], "first_name": "trevor", "last_name": "bunting", "movies":
null}'
@chbatey
User defined functions
• Run code on the server !Dangerous!
• Java + JavaScript supported out of the box
@chbatey
UDF example
CREATE TABLE user (
username text primary key,
first_name text ,
last_name text ,
emails set<text> ,
country text);
@chbatey
Custom name
CREATE FUNCTION name ( first_name text, last_name text )
CALLED ON NULL INPUT
RETURNS text LANGUAGE java
AS ‘
return first_name + " " + last_name;
‘;
cqlsh:twotwo> select name(first_name, last_name) FROM user;
twotwo.name(first_name, last_name)
------------------------------------
Christopher Batey
@chbatey
User defined aggregates
CREATE AGGREGATE average ( int )
SFUNC averageState
STYPE tuple<int,bigint>
FINALFUNC averageFinal
INITCOND (0, 0);
Called for every row
state passed between
Initial state
Return type (CQL)
Optional function called on
final state
@chbatey
State function
CREATE FUNCTION averageState ( state tuple<int,bigint>, value int )
CALLED ON NULL INPUT
RETURNS tuple<int,bigint>
LANGUAGE java
AS '
if (val != null) {
state.setInt(0, state.getInt(0)+1);
state.setLong(1, state.getLong(1)+val.intValue());
}
return state;
';
Type Columns
@chbatey
Final function
CREATE FUNCTION averageFinal ( state tuple<int,bigint> )
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java
AS '
double r = 0;
if (state.getInt(0) == 0) return null;
double r = state.getLong(1) / state.getInt(0);
return Double.valueOf(r);
';
Type
@chbatey
Putting it all together
@chbatey
Customer events
CREATE AGGREGATE count_by_type(text)
SFUNC countEventTypes
STYPE map<text, int>
INITCOND {};
CREATE FUNCTION countEventTypes( state map<text, int>, type text )
CALLED ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java AS '
Integer count = (Integer) state.get(type);
if (count == null) count = 1;
else count = count + 1; state.put(type, count);
return state; ' ;
@chbatey
Customer events
@chbatey
Built in aggregates
• count
• max
• min
• avg
• sum
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
@chbatey
Built in time functions
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
@chbatey
Built in aggregates in action
@chbatey
“Materialised views” with Spark
@chbatey
Pure C*
@chbatey
Small print
• New types
- smallint - short
- tinyint - byte
- date
- time
• Warnings now sent back to client
- batch too large
@chbatey
Time
@chbatey
Materialsed views
• Designed to stop *you* having to duplicate
• Do we need a secondary index primer?
@chbatey
Customer events table
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
staff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
create INDEX on customer_events (staff_id) ;
@chbatey
Indexes to the rescue?
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
@chbatey
Secondary index are local
• The staff_id partition in the secondary index is not
distributed like a normal table
• The secondary index entries are only stored on the node
that contains the customer_id partition
@chbatey
Indexes to the rescue?
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
staff_id customer_id
bill rusty
bill rusty
trevor rusty
A B
chbatey rusty
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
customer_events table
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
staff_id index
@chbatey
Do it your self index
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
CREATE TABLE if NOT EXISTS customer_events_by_staff (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (staff_id, time))
@chbatey
KillrWeather data model
@chbatey
KillrWeather data model
@chbatey
KillrWeather data model
@chbatey
KillrWeather data model
INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature,
one_hour_precip ) values ('station1', 2012, 12, 25, 1, 'GB', 'Cumbria', 14.0, 20) ;
INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature,
one_hour_precip ) values ('station2', 2012, 12, 25, 1, 'GB', 'Cumbria', 4.0, 2) ;
INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature,
one_hour_precip ) values ('station3', 2012, 12, 25, 1, 'GB', 'Greater London', 16.0, 10) ;
@chbatey
Querying by state?
@chbatey
Fine print
• Primary key columns + one other in your MV primary key
• Un-used Primary key columns are added to the end of
your MV PK
• If the part of your primary key is NULL then it won't
appear in the materialised view
• This is not free!
@chbatey
Combining aggregates + MVs
@chbatey
Including the month
@chbatey
Conclusions
• We still denormalise and duplicate to achieve scalability
and performance
• We just let C* do it for us :)

2 Dundee - Cassandra-3

  • 1.
    ©2013 DataStax Confidential.Do not distribute without consent. @chbatey Christopher Batey
 Cassandra 2.2 and 3.0
  • 2.
    @chbatey First comes ablog • Each new feature has a vastly more detailed blog post: http://christopher-batey.blogspot.co.uk/
  • 3.
  • 4.
    @chbatey New features • 2.2 -JSON - User defined functions - User defined aggregates - Role based authentication - The small print • 3.0 - New storage engine - Materialised views
  • 5.
    @chbatey Hello JSON • createTABLE user (username text primary key, first_name text , last_name text , emails set<text> , country text); • INSERT INTO user JSON '{"username": "chbatey", "first_name":"Christopher", "last_name": "Batey", “emails":["christopher.batey@datastax.com"]}';
  • 6.
  • 7.
    @chbatey JSON + UserDefined Types • CREATE TYPE movie (title text, time timestamp, description text); • ALTER TABLE user ADD movies set<frozen<movie>>; • UPDATE user SET movies = { { title:'Batman', time:'2011-02-03T04:05:00+0000', description: 'This film rocks' } } where username = 'chbatey';
  • 8.
  • 9.
    @chbatey Cassandra HTTP Wrapper? @RequestMapping(method= {RequestMethod.POST}, value = "/{keyspace}/{table}", consumes = "application/json")
 public ResponseEntity<String> store(@PathVariable String keyspace, @PathVariable String table, @RequestBody String body) {
 session.execute(String.format("insert into %s.%s JSON '%s'", keyspace, table, body));
 return ResponseEntity.ok("OK");
 } Keyspace Table Raw JSON curl --header "Content-Type: application/json" -X POST -v "localhost:8080/twotwo/ user" --data '{"username": "trev2", "country": null, "emails": ["trevor@gmail.com", "trevor@yahoo.com"], "first_name": "trevor", "last_name": "bunting", "movies": null}'
  • 10.
    @chbatey User defined functions •Run code on the server !Dangerous! • Java + JavaScript supported out of the box
  • 11.
    @chbatey UDF example CREATE TABLEuser ( username text primary key, first_name text , last_name text , emails set<text> , country text);
  • 12.
    @chbatey Custom name CREATE FUNCTIONname ( first_name text, last_name text ) CALLED ON NULL INPUT RETURNS text LANGUAGE java AS ‘ return first_name + " " + last_name; ‘; cqlsh:twotwo> select name(first_name, last_name) FROM user; twotwo.name(first_name, last_name) ------------------------------------ Christopher Batey
  • 13.
    @chbatey User defined aggregates CREATEAGGREGATE average ( int ) SFUNC averageState STYPE tuple<int,bigint> FINALFUNC averageFinal INITCOND (0, 0); Called for every row state passed between Initial state Return type (CQL) Optional function called on final state
  • 14.
    @chbatey State function CREATE FUNCTIONaverageState ( state tuple<int,bigint>, value int ) CALLED ON NULL INPUT RETURNS tuple<int,bigint> LANGUAGE java AS ' if (val != null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state; '; Type Columns
  • 15.
    @chbatey Final function CREATE FUNCTIONaverageFinal ( state tuple<int,bigint> ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS ' double r = 0; if (state.getInt(0) == 0) return null; double r = state.getLong(1) / state.getInt(0); return Double.valueOf(r); '; Type
  • 16.
  • 17.
    @chbatey Customer events CREATE AGGREGATEcount_by_type(text) SFUNC countEventTypes STYPE map<text, int> INITCOND {}; CREATE FUNCTION countEventTypes( state map<text, int>, type text ) CALLED ON NULL INPUT RETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count = count + 1; state.put(type, count); return state; ' ;
  • 18.
  • 19.
    @chbatey Built in aggregates •count • max • min • avg • sum https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
  • 20.
    @chbatey Built in timefunctions https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
  • 21.
  • 22.
  • 23.
  • 24.
    @chbatey Small print • Newtypes - smallint - short - tinyint - byte - date - time • Warnings now sent back to client - batch too large
  • 25.
  • 26.
    @chbatey Materialsed views • Designedto stop *you* having to duplicate • Do we need a secondary index primer?
  • 27.
    @chbatey Customer events table CREATETABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) create INDEX on customer_events (staff_id) ;
  • 28.
    @chbatey Indexes to therescue? customer_id time staff_id chbatey 2015-03-03 08:52:45 trevor chbatey 2015-03-03 08:52:54 trevor chbatey 2015-03-03 08:53:11 bill chbatey 2015-03-03 08:53:18 bill rusty 2015-03-03 08:56:57 bill rusty 2015-03-03 08:57:02 bill rusty 2015-03-03 08:57:20 trevor staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey bill rusty bill rusty trevor rusty
  • 29.
    @chbatey Secondary index arelocal • The staff_id partition in the secondary index is not distributed like a normal table • The secondary index entries are only stored on the node that contains the customer_id partition
  • 30.
    @chbatey Indexes to therescue? staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey staff_id customer_id bill rusty bill rusty trevor rusty A B chbatey rusty customer_id time staff_id chbatey 2015-03-03 08:52:45 trevor chbatey 2015-03-03 08:52:54 trevor chbatey 2015-03-03 08:53:11 bill chbatey 2015-03-03 08:53:18 bill rusty 2015-03-03 08:56:57 bill rusty 2015-03-03 08:57:02 bill rusty 2015-03-03 08:57:20 trevor customer_events table staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey bill rusty bill rusty trevor rusty staff_id index
  • 31.
    @chbatey Do it yourself index CREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))
  • 32.
  • 33.
  • 34.
  • 35.
    @chbatey KillrWeather data model INSERTINTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station1', 2012, 12, 25, 1, 'GB', 'Cumbria', 14.0, 20) ; INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station2', 2012, 12, 25, 1, 'GB', 'Cumbria', 4.0, 2) ; INSERT INTO raw_weather_data(wsid, year, month, day, hour, country_code, state_code, temperature, one_hour_precip ) values ('station3', 2012, 12, 25, 1, 'GB', 'Greater London', 16.0, 10) ;
  • 36.
  • 37.
    @chbatey Fine print • Primarykey columns + one other in your MV primary key • Un-used Primary key columns are added to the end of your MV PK • If the part of your primary key is NULL then it won't appear in the materialised view • This is not free!
  • 38.
  • 39.
  • 40.
    @chbatey Conclusions • We stilldenormalise and duplicate to achieve scalability and performance • We just let C* do it for us :)