CQL3 in depth
Cassandra Conference in Tokyo, 11/29/2012


Yuki Morishita
Software Engineer@DataStax / Apache Cassandra Committer


©2012 DataStax
                                                          1
Agenda!
   • Why CQL3?
   • CQL3 walkthrough
            • Defining Schema
            • Querying / Mutating Data
            • New features
   • Related topics
            • Native transport


©2012 DataStax
                                         2
Why CQL3?
©2012 DataStax
                 3
Cassandra Storage
                 create column family profiles
                 with key_validation_class = UTF8Type
                 and comparator = UTF8Type
                 and column_metadata = [
                    {column_name: first_name, validation_class: UTF8Type},
                    {column_name: last_name, validation_class: UTF8Type},
                    {column_name: year, validation_class: IntegerType}
                 ];




                      row key          columns        values are validated by validation_class


                                nobu     first_name   Nobunaga
                                                                                 columns are sorted
                                         last_name    Oda
                                                                                 in comparator order
                                         year         1582




©2012 DataStax
                                                                                                       4
Thrift API
   • Low level: get, get_slice, mutate...
   • Directly exposes internal storage
     structure
   • Hard to change the signature of API




©2012 DataStax
                                            5
Inserting data with Thrift
   Column col = new Column(ByteBuffer.wrap("name".getBytes()));
   col.setValue(ByteBuffer.wrap("value".getBytes()));
   col.setTimestamp(System.currentTimeMillis());

   ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
   cosc.setColumn(col);

   Mutation mutation = new Mutation();
   mutation.setColumn_or_supercolumn(cosc);

   List<Mutation> mutations = new ArrayList<Mutation>();
   mutations.add(mutation);

   Map<String, List<Mutation>> cf = new HashMap<String, List<Mutation>>();
   cf.put("Standard1", mutations);

   Map<ByteBuffer, Map<String, List<Mutation>>> records
                        = new HashMap<ByteBuffer, Map<String, List<Mutation>>>();
   records.put(ByteBuffer.wrap("key".getBytes()), cf);

   client.batch_mutate(records, consistencyLevel);




©2012 DataStax
                                                                                    6
... with Cassandra Query Language


                 INSERT INTO “Standard1” (key, name)
                 VALUES (“key”, “value”);




    • Introduced in 0.8(CQL), updated in
      1.0(CQL2)
    • Syntax similar to SQL
    • More extensible than Thrift API
©2012 DataStax
                                                       7
CQL2 Problems
   • Almost 1 to 1 mapping to Thrift API, so
     not compose with the row-oriented parts
     of SQL
   • No support for CompositeType




©2012 DataStax
                                               8
CQL3
   • Maps storage to a more natural rows-
     and-columns representation using
     CompositeType
            • Wide rows are “transposed” and unpacked
              into named columns
   • beta in 1.1, default in 1.2
   • New features
            • Collection support

©2012 DataStax
                                                        9
CQL3 walkthrough
©2012 DataStax
                   10
Defining Keyspace
   • Syntax is changed from CQL2

     CREATE KEYSPACE my_keyspace WITH replication = {
         'class': 'SimpleStrategy',
         'replication_factor': 2
     };




©2012 DataStax
                                                        11
Defining Static Column Family
   • “Strict” schema definition (and it’s good
     thing)
            • You cannot add column arbitrary
            • You need ALTER TABLE ... ADD column
              first
   • Columns are defined and sorted using
     CompositeType comparator


©2012 DataStax
                                                    12
Defining Static Column Family

   CREATE TABLE profiles (
     user_id text PRIMARY KEY,              user_id | first_name | last_name | year
     first_name text,                      ---------+------------+-----------+------
     last_name text,
     year int                                  nobu |   Nobunaga |       Oda | 1582
   )




                            CompositeType(UTF8Type)
                 user_id                        values are validated by type definition


                     nobu         :

                                  first_name:    Nobunaga
                                                                           columns are sorted
                                  last_name:     Oda
                                                                           in comparator order
                                  year:          1582

©2012 DataStax
                                                                                                 13
Defining Dynamic Column Family
   • Then, how can we add columns
     dynamically to our time series data like
     we did before?
            • Use compound key




©2012 DataStax
                                                14
Compound key
                        CREATE TABLE comments (
                            article_id uuid,
                            posted_at timestamp,
                            author text,
                            content text,
                            PRIMARY KEY (article_id, posted_at)
                        )

           CompositeType(DateType, UTF8Type)


    article_id                     values are validated by type definition

   550e8400-..       1350499616:

                     1350499616:author              yukim
                                                                            columns are sorted
                     1350499616:content             blah, blah, blah        in comparator order,
                                                                            first by date, and then
                     1368499616:                                            column name
                     1368499616:author              yukim

                     1368499616:content             well, well, well
                                              ...
©2012 DataStax
                                                                                             15
Compound key

cqlsh:ks> SELECT * FROM comments;

 article_id   | posted_at                | author | content
--------------+--------------------------+--------+------------------
 550e8400-... | 1970-01-17 00:08:19+0900 | yukim | blah, blah, blah
 550e8400-... | 1970-01-17 05:08:19+0900 | yukim | well, well, well

cqlsh:ks> SELECT * FROM comments WHERE posted_at >= '1970-01-17 05:08:19+0900';


 article_id   | posted_at                | author | content
--------------+--------------------------+--------+------------------
 550e8400-... | 1970-01-17 05:08:19+0900 | yukim | well, well, well




©2012 DataStax
                                                                                  16
Changes worth noting
   • Identifiers (keyspace/table/columns
     names) are always case insensitive by
     default
            • Use double quote(“) to force case
   • Compaction setting is now map type
            CREATE TABLE test (
                 ...
            ) WITH COMPACTION = {
                 'class': 'SizeTieredCompactionStrategy',
                 'min_threshold': 2,
                 'max_threshold': 4
            };
©2012 DataStax
                                                            17
Changes worth noting
   • system.schema_*
            • All schema information are stored in system
              Keyspace
                 • schema_keyspaces, schema_columnfamilies,
                   schema_columns
            • system tables themselves are CQL3 schema
   • CQL3 schema are not visible through
     cassandra-cli’s ‘describe’ command.
            • use cqlsh’s ‘describe columnfamily’
©2012 DataStax
                                                              18
More on CQL3 schema
   • Thrift to CQL3 migration
            • http://www.datastax.com/dev/blog/thrift-to-cql3

   • For better understanding
            • http://www.datastax.com/dev/blog/whats-new-in-cql-3-0
            • http://www.datastax.com/dev/blog/cql3-evolutions
            • http://www.datastax.com/dev/blog/cql3-for-cassandra-experts




©2012 DataStax
                                                                            19
Mutating Data


                 INSERT INTO example (id, name) VALUES (...)

                 UPDATE example SET f = ‘foo’ WHERE ...

                 DELETE FROM example WHERE ...




   • No more USING CONSISTENCY
            • Consistency level setting is moved to protocol
              level
©2012 DataStax
                                                               20
Batch Mutate
                 BEGIN BATCH
                     INSERT INTO aaa (id, col) VALUES (...)
                     UPDATE bbb SET col1 = ‘val1’ WHERE ...
                     ...
                 APPLY BATCH;


   • Batches are atomic by default from 1.2
            • does not mean mutations are isolated
              (mutation within a row is isolated from 1.1)
            • some performance penalty because of batch
              log process
©2012 DataStax
                                                              21
Batch Mutate
   • Use non atomic batch if you need
     performance, not atomicity
                 BEGIN UNLOGGED BATCH
                     ...
                 APPLY BATCH;



   • More on dev blog
            • http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2




©2012 DataStax
                                                                                 22
Querying Data

                 SELECT article_id, posted_at, author
                 FROM comments
                 WHERE
                   article_id >= ‘...’
                 ORDER BY posted_at DESC
                 LIMIT 100;




©2012 DataStax
                                                        23
Querying Data
   • TTL/WRITETIME
            • You can query TTL or write time of the column.

                   cqlsh:ks> SELECT WRITETIME(author) FROM comments;

                    writetime(author)
                   -------------------
                     1354146105288000




©2012 DataStax
                                                                       24
Collection support
   • Collection
            • Set
                 • Unordered, no duplicates
            • List
                 • Ordered, allow duplicates
            • Map
                 • Keys and associated values




©2012 DataStax
                                                25
Collection support

                 CREATE TABLE example (
                    id uuid PRIMARY KEY,
                    tags set<text>,
                    points list<int>,
                    attributes map<text, text>
                 );




   • Collections are typed, but cannot be
     nested(no list<list<text>>)
   • No secondary index on collections
©2012 DataStax
                                                 26
Collection support

           INSERT INTO example (id, tags, points, attributes)
           VALUES (
               ‘62c36092-82a1-3a00-93d1-46196ee77204’,
               {‘foo’, ‘bar’, ‘baz’}, // set
               [100, 20, 93],          // list
               {‘abc’: ‘def’}          // map
           );




©2012 DataStax
                                                                27
Collection support
   • Set
    UPDATE example SET tags = tags + {‘qux’} WHERE ...
    UPDATE example SET tags = tags - {‘foo’} WHERE ...


   • List
    UPDATE example SET points = points + [20, 30] WHERE ...
    UPDATE example SET points = points - [100] WHERE ...


   • Map
    UPDATE example SET attributes[‘ghi’] = ‘jkl’ WHERE ...
    DELETE attributes[‘abc’] FROM example WHERE ...




©2012 DataStax
                                                              28
Collection support

           SELECT tags, points, attributes FROM example;

            tags            | points        | attributes
           -----------------+---------------+--------------
            {baz, foo, bar} | [100, 20, 93] | {abc: def}




   • You cannot retrieve item in collection
     individually

©2012 DataStax
                                                              29
Collection support
   • Each element in collection is internally
     stored as one Cassandra column
   • More on dev blog
            • http://www.datastax.com/dev/blog/cql3_collections




©2012 DataStax
                                                                  30
Related topics
©2012 DataStax
                 31
Native Transport
   • CQL3 still goes through Thrift’s
     execute_cql3_query API
   • Native Transport support introduces
     Cassandra’s original binary protocol
            • Async IO, server event push, ...
            • http://www.datastax.com/dev/blog/binary-protocol

   • Try DataStax Java native driver with C*
     1.2 beta today!
            • https://github.com/datastax/java-driver

©2012 DataStax
                                                                 32
Question ?

                 Or contact me later if you have one
                         yuki@datastax.com
                         yukim (IRC, twitter)                   Now
                                                               Hiring
                                                       talented engineers from all
                                                             over the world!




©2012 DataStax
                                                                                33

CQL3 in depth

  • 1.
    CQL3 in depth CassandraConference in Tokyo, 11/29/2012 Yuki Morishita Software Engineer@DataStax / Apache Cassandra Committer ©2012 DataStax 1
  • 2.
    Agenda! • Why CQL3? • CQL3 walkthrough • Defining Schema • Querying / Mutating Data • New features • Related topics • Native transport ©2012 DataStax 2
  • 3.
  • 4.
    Cassandra Storage create column family profiles with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata = [ {column_name: first_name, validation_class: UTF8Type}, {column_name: last_name, validation_class: UTF8Type}, {column_name: year, validation_class: IntegerType} ]; row key columns values are validated by validation_class nobu first_name Nobunaga columns are sorted last_name Oda in comparator order year 1582 ©2012 DataStax 4
  • 5.
    Thrift API • Low level: get, get_slice, mutate... • Directly exposes internal storage structure • Hard to change the signature of API ©2012 DataStax 5
  • 6.
    Inserting data withThrift Column col = new Column(ByteBuffer.wrap("name".getBytes())); col.setValue(ByteBuffer.wrap("value".getBytes())); col.setTimestamp(System.currentTimeMillis()); ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); Mutation mutation = new Mutation(); mutation.setColumn_or_supercolumn(cosc); List<Mutation> mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map<String, List<Mutation>> cf = new HashMap<String, List<Mutation>>(); cf.put("Standard1", mutations); Map<ByteBuffer, Map<String, List<Mutation>>> records = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); records.put(ByteBuffer.wrap("key".getBytes()), cf); client.batch_mutate(records, consistencyLevel); ©2012 DataStax 6
  • 7.
    ... with CassandraQuery Language INSERT INTO “Standard1” (key, name) VALUES (“key”, “value”); • Introduced in 0.8(CQL), updated in 1.0(CQL2) • Syntax similar to SQL • More extensible than Thrift API ©2012 DataStax 7
  • 8.
    CQL2 Problems • Almost 1 to 1 mapping to Thrift API, so not compose with the row-oriented parts of SQL • No support for CompositeType ©2012 DataStax 8
  • 9.
    CQL3 • Maps storage to a more natural rows- and-columns representation using CompositeType • Wide rows are “transposed” and unpacked into named columns • beta in 1.1, default in 1.2 • New features • Collection support ©2012 DataStax 9
  • 10.
  • 11.
    Defining Keyspace • Syntax is changed from CQL2 CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 2 }; ©2012 DataStax 11
  • 12.
    Defining Static ColumnFamily • “Strict” schema definition (and it’s good thing) • You cannot add column arbitrary • You need ALTER TABLE ... ADD column first • Columns are defined and sorted using CompositeType comparator ©2012 DataStax 12
  • 13.
    Defining Static ColumnFamily CREATE TABLE profiles ( user_id text PRIMARY KEY, user_id | first_name | last_name | year first_name text, ---------+------------+-----------+------ last_name text, year int nobu | Nobunaga | Oda | 1582 ) CompositeType(UTF8Type) user_id values are validated by type definition nobu : first_name: Nobunaga columns are sorted last_name: Oda in comparator order year: 1582 ©2012 DataStax 13
  • 14.
    Defining Dynamic ColumnFamily • Then, how can we add columns dynamically to our time series data like we did before? • Use compound key ©2012 DataStax 14
  • 15.
    Compound key CREATE TABLE comments ( article_id uuid, posted_at timestamp, author text, content text, PRIMARY KEY (article_id, posted_at) ) CompositeType(DateType, UTF8Type) article_id values are validated by type definition 550e8400-.. 1350499616: 1350499616:author yukim columns are sorted 1350499616:content blah, blah, blah in comparator order, first by date, and then 1368499616: column name 1368499616:author yukim 1368499616:content well, well, well ... ©2012 DataStax 15
  • 16.
    Compound key cqlsh:ks> SELECT* FROM comments; article_id | posted_at | author | content --------------+--------------------------+--------+------------------ 550e8400-... | 1970-01-17 00:08:19+0900 | yukim | blah, blah, blah 550e8400-... | 1970-01-17 05:08:19+0900 | yukim | well, well, well cqlsh:ks> SELECT * FROM comments WHERE posted_at >= '1970-01-17 05:08:19+0900'; article_id | posted_at | author | content --------------+--------------------------+--------+------------------ 550e8400-... | 1970-01-17 05:08:19+0900 | yukim | well, well, well ©2012 DataStax 16
  • 17.
    Changes worth noting • Identifiers (keyspace/table/columns names) are always case insensitive by default • Use double quote(“) to force case • Compaction setting is now map type CREATE TABLE test ( ... ) WITH COMPACTION = { 'class': 'SizeTieredCompactionStrategy', 'min_threshold': 2, 'max_threshold': 4 }; ©2012 DataStax 17
  • 18.
    Changes worth noting • system.schema_* • All schema information are stored in system Keyspace • schema_keyspaces, schema_columnfamilies, schema_columns • system tables themselves are CQL3 schema • CQL3 schema are not visible through cassandra-cli’s ‘describe’ command. • use cqlsh’s ‘describe columnfamily’ ©2012 DataStax 18
  • 19.
    More on CQL3schema • Thrift to CQL3 migration • http://www.datastax.com/dev/blog/thrift-to-cql3 • For better understanding • http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 • http://www.datastax.com/dev/blog/cql3-evolutions • http://www.datastax.com/dev/blog/cql3-for-cassandra-experts ©2012 DataStax 19
  • 20.
    Mutating Data INSERT INTO example (id, name) VALUES (...) UPDATE example SET f = ‘foo’ WHERE ... DELETE FROM example WHERE ... • No more USING CONSISTENCY • Consistency level setting is moved to protocol level ©2012 DataStax 20
  • 21.
    Batch Mutate BEGIN BATCH INSERT INTO aaa (id, col) VALUES (...) UPDATE bbb SET col1 = ‘val1’ WHERE ... ... APPLY BATCH; • Batches are atomic by default from 1.2 • does not mean mutations are isolated (mutation within a row is isolated from 1.1) • some performance penalty because of batch log process ©2012 DataStax 21
  • 22.
    Batch Mutate • Use non atomic batch if you need performance, not atomicity BEGIN UNLOGGED BATCH ... APPLY BATCH; • More on dev blog • http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2 ©2012 DataStax 22
  • 23.
    Querying Data SELECT article_id, posted_at, author FROM comments WHERE article_id >= ‘...’ ORDER BY posted_at DESC LIMIT 100; ©2012 DataStax 23
  • 24.
    Querying Data • TTL/WRITETIME • You can query TTL or write time of the column. cqlsh:ks> SELECT WRITETIME(author) FROM comments; writetime(author) ------------------- 1354146105288000 ©2012 DataStax 24
  • 25.
    Collection support • Collection • Set • Unordered, no duplicates • List • Ordered, allow duplicates • Map • Keys and associated values ©2012 DataStax 25
  • 26.
    Collection support CREATE TABLE example ( id uuid PRIMARY KEY, tags set<text>, points list<int>, attributes map<text, text> ); • Collections are typed, but cannot be nested(no list<list<text>>) • No secondary index on collections ©2012 DataStax 26
  • 27.
    Collection support INSERT INTO example (id, tags, points, attributes) VALUES ( ‘62c36092-82a1-3a00-93d1-46196ee77204’, {‘foo’, ‘bar’, ‘baz’}, // set [100, 20, 93], // list {‘abc’: ‘def’} // map ); ©2012 DataStax 27
  • 28.
    Collection support • Set UPDATE example SET tags = tags + {‘qux’} WHERE ... UPDATE example SET tags = tags - {‘foo’} WHERE ... • List UPDATE example SET points = points + [20, 30] WHERE ... UPDATE example SET points = points - [100] WHERE ... • Map UPDATE example SET attributes[‘ghi’] = ‘jkl’ WHERE ... DELETE attributes[‘abc’] FROM example WHERE ... ©2012 DataStax 28
  • 29.
    Collection support SELECT tags, points, attributes FROM example; tags | points | attributes -----------------+---------------+-------------- {baz, foo, bar} | [100, 20, 93] | {abc: def} • You cannot retrieve item in collection individually ©2012 DataStax 29
  • 30.
    Collection support • Each element in collection is internally stored as one Cassandra column • More on dev blog • http://www.datastax.com/dev/blog/cql3_collections ©2012 DataStax 30
  • 31.
  • 32.
    Native Transport • CQL3 still goes through Thrift’s execute_cql3_query API • Native Transport support introduces Cassandra’s original binary protocol • Async IO, server event push, ... • http://www.datastax.com/dev/blog/binary-protocol • Try DataStax Java native driver with C* 1.2 beta today! • https://github.com/datastax/java-driver ©2012 DataStax 32
  • 33.
    Question ? Or contact me later if you have one yuki@datastax.com yukim (IRC, twitter) Now Hiring talented engineers from all over the world! ©2012 DataStax 33