CQL: SQL for Cassandra
      Cassandra NYC
     December 6, 2011

           Eric Evans
       eric@acunu.com
     @jericevans, @acunu
●   Overview, history, motivation
●   Performance characteristics
●   Coming soon (?)
●   Drivers status
What?
●   Cassandra Query Language
    ●   aka CQL
    ●   aka /ˈsēkwəl/
●   Exactly like SQL (except where it's not)
●   Introduced in Cassandra 0.8.0
●   Ready for production use
SQL? Almost.

–- Inserts or updates
INSERT INTO Standard1 (KEY, col0, col1)
VALUES (key, value0, value1)
                   vs.
–- Inserts or updates
UPDATE Standard1
SET col0=value0, col1=value1 WHERE KEY=key
SQL? Almost.
–- Get columns for a row
SELECT col0,col1 FROM Standard1 WHERE KEY=key

–- Range of columns for a row
SELECT col0..colN
    FROM Standard1 WHERE KEY=key

–- First 10 results from a range of columns
SELECT FIRST 10 col0..colN
    FROM Standard1 WHERE KEY=key

–- Invert the sorting of results
SELECT REVERSED col0..colN
    FROM Standard1 WHERE KEY=key
Why?
Interface Instability
(Un)ease of use
Column col = new Column(ByteBuffer.wrap(“name”.getBytes()));
col.setValue(ByteBuffer.wrap(“value”.getBytes()));
col.setTimestamp(System.currentTimeMillis());

ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
Mutation mutation = new Mutation();
Mutation.setColumnOrSuperColumn(cosc);
List mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set(“Standard1”, mutations);
mutations.put(ByteBuffer.wrap(“key”.getBytes()), cf_map)
CQL
INSERT INTO Standard1 (KEY, col0)
    VALUES (key, value0)
Why? How about...
●   Better stability guarantees
●   Easier to use (you already know it)
●   Better code readability / maintainability
Why? How about...
●   Better stability guarantees
●   Easier to use (you already know it)
●   Better code readability / maintainability
●   Irritates the NoSQL purists
Why? How about...
●   Better stability guarantees
●   Easier to use (you already know it)
●   Better code readability / maintainability
●   Irritates the NoSQL purists
●   (Still )irritates the SQL purists
Performance
Thrift RPC
Column col = new Column(ByteBuffer.wrap(“name”.getBytes()));
col.setValue(ByteBuffer.wrap(“value”.getBytes()));
col.setTimestamp(System.currentTimeMillis());

ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
Mutation mutation = new Mutation();
Mutation.setColumnOrSuperColumn(cosc);
List mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set(“Standard1”, mutations);
mutations.put(ByteBuffer.wrap(“key”.getBytes()), cf_map)
Your query, it's a graph
CQL

INSERT INTO Standard1 (KEY, col0)
    VALUES (key, value0)
Hotspot
             Quoted string literals


UPDATE table SET 'name' = 'value'
    WHERE KEY = 'somekey'
Hotspot
             Quoted string literals


UPDATE table SET 'name' = 'value'
    WHERE KEY = 'somekey'
Hotspot
                  Quoted string literals


UPDATE table SET 'name' = 'value'
    WHERE KEY = 'somekey'


●   Anything that appears between quotes
●   Inlined Java constructs a StringBuilder to store
    the contents (slow not fast)
●   Incurred multiple times per statement
Hotspot
                Marshalling


UPDATE table SET 'clear' = 'abffaadd10'
    WHERE KEY = 'acfe12ff'
Hotspot
                  Marshalling


UPDATE table SET 'clear' = 'abffaadd10'
    WHERE KEY = 'acfe12ff'
          ascii                 blob
Hotspot
                        Marshalling


UPDATE table SET 'clear' = 'abffaadd10'
    WHERE KEY = 'acfe12ff'
              ascii                   blob


●   Terms are marshalled to bytes by type
●   String.getBytes is slow (AsciiType)
●   Hex conversion is fast faster (BytesType)
●   Incurred multiple times per statement
Hotspot
                   Copying / Conversion


execute_cql_query(
    ByteBuffer query, enum compression)
●   Query is binary to support compression (is it worth it?)
●   And don't forget the String → ByteBuffer conversion on
    the client-side
●   Incurred only once per statement!
Achtung!
             (These tests weren't perfect)

●   Uneeded String → ByteBuffer → String
●   No query compression implemented
●   Co-located client and server
Insert 20M rows, 5 columns




           Avg rate      Avg latency
     RPC   20,953/s      1.6ms
     CQL   19,176/s (-8%) 1.7ms (+9%)
Insert 10M rows, 5 cols (indexed)




               Avg rate        Avg latency
         RPC   9,850/s         5.3ms
         CQL   9,290/s (-6%)   5.5ms (+4%)
Counts, 10M rows, 5 cols




          Avg rate      Avg latency
    RPC   18,052/s      1.7ms
    CQL   17,635/s (-2%) 1.7ms
Reading 20M rows, 5 cols




          Avg rate       Avg latency
    RPC 22.726/s         2.0ms
    CQL   20,272/s (-11%) 2.3ms (+10%)
In Summary
Don't step over dollars to pick up pennies!
Coming Soon(ish)
Roadmap
●   Prepared statements (CASSANDRA-2475)
●   Compound columns (CASSANDRA-2474)
●   Custom transport / protocol (CASSANDRA-2478)
●   Performance testing (CASSANDRA-2268)
●   Schema introspection (CASSANDRA-2477)
●   Multiget support (CASSANDRA-3069)
Drivers
Drivers
●   Hosted on Apache Extras (Google Code)
●   Tagged cassandra and cql
●   Licensed using Apache License 2.0
●   Conforming to a standard for database
    connectivity (if applicable)
●   Coming soon, automated testing and
    acceptance criteria
Drivers
Driver                           Platform                 Status
cassandra-jdbc                   Java                     Good
cassandra-dbapi2                 Python                   Good
cassandra-ruby                   Ruby                     New
cassandra-pdo                    PHP                      New
cassandra-node                   Node.js                  Good

http://code.google.com/a/apache-extras.org/hosting/search?q=label%3aCassandra
The End

CQL: SQL In Cassandra

  • 1.
    CQL: SQL forCassandra Cassandra NYC December 6, 2011 Eric Evans eric@acunu.com @jericevans, @acunu
  • 2.
    Overview, history, motivation ● Performance characteristics ● Coming soon (?) ● Drivers status
  • 3.
    What? ● Cassandra Query Language ● aka CQL ● aka /ˈsēkwəl/ ● Exactly like SQL (except where it's not) ● Introduced in Cassandra 0.8.0 ● Ready for production use
  • 4.
    SQL? Almost. –- Insertsor updates INSERT INTO Standard1 (KEY, col0, col1) VALUES (key, value0, value1) vs. –- Inserts or updates UPDATE Standard1 SET col0=value0, col1=value1 WHERE KEY=key
  • 5.
    SQL? Almost. –- Getcolumns for a row SELECT col0,col1 FROM Standard1 WHERE KEY=key –- Range of columns for a row SELECT col0..colN FROM Standard1 WHERE KEY=key –- First 10 results from a range of columns SELECT FIRST 10 col0..colN FROM Standard1 WHERE KEY=key –- Invert the sorting of results SELECT REVERSED col0..colN FROM Standard1 WHERE KEY=key
  • 6.
  • 7.
  • 8.
    (Un)ease of use Columncol = new Column(ByteBuffer.wrap(“name”.getBytes())); col.setValue(ByteBuffer.wrap(“value”.getBytes())); col.setTimestamp(System.currentTimeMillis()); ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); Mutation mutation = new Mutation(); Mutation.setColumnOrSuperColumn(cosc); List mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set(“Standard1”, mutations); mutations.put(ByteBuffer.wrap(“key”.getBytes()), cf_map)
  • 9.
    CQL INSERT INTO Standard1(KEY, col0) VALUES (key, value0)
  • 10.
    Why? How about... ● Better stability guarantees ● Easier to use (you already know it) ● Better code readability / maintainability
  • 11.
    Why? How about... ● Better stability guarantees ● Easier to use (you already know it) ● Better code readability / maintainability ● Irritates the NoSQL purists
  • 12.
    Why? How about... ● Better stability guarantees ● Easier to use (you already know it) ● Better code readability / maintainability ● Irritates the NoSQL purists ● (Still )irritates the SQL purists
  • 14.
  • 16.
    Thrift RPC Column col= new Column(ByteBuffer.wrap(“name”.getBytes())); col.setValue(ByteBuffer.wrap(“value”.getBytes())); col.setTimestamp(System.currentTimeMillis()); ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); Mutation mutation = new Mutation(); Mutation.setColumnOrSuperColumn(cosc); List mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set(“Standard1”, mutations); mutations.put(ByteBuffer.wrap(“key”.getBytes()), cf_map)
  • 17.
  • 18.
    CQL INSERT INTO Standard1(KEY, col0) VALUES (key, value0)
  • 19.
    Hotspot Quoted string literals UPDATE table SET 'name' = 'value' WHERE KEY = 'somekey'
  • 20.
    Hotspot Quoted string literals UPDATE table SET 'name' = 'value' WHERE KEY = 'somekey'
  • 21.
    Hotspot Quoted string literals UPDATE table SET 'name' = 'value' WHERE KEY = 'somekey' ● Anything that appears between quotes ● Inlined Java constructs a StringBuilder to store the contents (slow not fast) ● Incurred multiple times per statement
  • 22.
    Hotspot Marshalling UPDATE table SET 'clear' = 'abffaadd10' WHERE KEY = 'acfe12ff'
  • 23.
    Hotspot Marshalling UPDATE table SET 'clear' = 'abffaadd10' WHERE KEY = 'acfe12ff' ascii blob
  • 24.
    Hotspot Marshalling UPDATE table SET 'clear' = 'abffaadd10' WHERE KEY = 'acfe12ff' ascii blob ● Terms are marshalled to bytes by type ● String.getBytes is slow (AsciiType) ● Hex conversion is fast faster (BytesType) ● Incurred multiple times per statement
  • 25.
    Hotspot Copying / Conversion execute_cql_query( ByteBuffer query, enum compression) ● Query is binary to support compression (is it worth it?) ● And don't forget the String → ByteBuffer conversion on the client-side ● Incurred only once per statement!
  • 26.
    Achtung! (These tests weren't perfect) ● Uneeded String → ByteBuffer → String ● No query compression implemented ● Co-located client and server
  • 27.
    Insert 20M rows,5 columns Avg rate Avg latency RPC 20,953/s 1.6ms CQL 19,176/s (-8%) 1.7ms (+9%)
  • 28.
    Insert 10M rows,5 cols (indexed) Avg rate Avg latency RPC 9,850/s 5.3ms CQL 9,290/s (-6%) 5.5ms (+4%)
  • 29.
    Counts, 10M rows,5 cols Avg rate Avg latency RPC 18,052/s 1.7ms CQL 17,635/s (-2%) 1.7ms
  • 30.
    Reading 20M rows,5 cols Avg rate Avg latency RPC 22.726/s 2.0ms CQL 20,272/s (-11%) 2.3ms (+10%)
  • 31.
    In Summary Don't stepover dollars to pick up pennies!
  • 32.
  • 33.
    Roadmap ● Prepared statements (CASSANDRA-2475) ● Compound columns (CASSANDRA-2474) ● Custom transport / protocol (CASSANDRA-2478) ● Performance testing (CASSANDRA-2268) ● Schema introspection (CASSANDRA-2477) ● Multiget support (CASSANDRA-3069)
  • 34.
  • 35.
    Drivers ● Hosted on Apache Extras (Google Code) ● Tagged cassandra and cql ● Licensed using Apache License 2.0 ● Conforming to a standard for database connectivity (if applicable) ● Coming soon, automated testing and acceptance criteria
  • 36.
    Drivers Driver Platform Status cassandra-jdbc Java Good cassandra-dbapi2 Python Good cassandra-ruby Ruby New cassandra-pdo PHP New cassandra-node Node.js Good http://code.google.com/a/apache-extras.org/hosting/search?q=label%3aCassandra
  • 37.