Speaker: Sylvain Lebresne, Software Engineer at DataStax
Video: http://www.youtube.com/watch?v=4GSfAS4nFAs&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=18
Since its inception, the Cassandra Query Language (CQL) has grown and matured, resulting in the 3rd version of the language (CQL3) being finalized in Cassandra 1.2 and further improved in Cassandra 2.0. Compared to the legacy Thrift API, CQL3 aims at providing an API that is higher level, more user friendly, but still fully assumes the distributed nature of Cassandra and it's storage engine. This talk will present CQL3, describing the reasoning and goals behind the language as well as the language itself. We will also touch on CQL's relationship with Thrift and will present the CQL binary protocol that has been introduced in Cassandra 1.2. We will wrap up by discussing the future of CQL.
2. A short CQL primer
New in Cassandra 2.0
Native protocol
What's next?
2/20
3. A better API for Cassandra
Thrift is not satisfactory:
· Not user friendly, hard to use.
· Low level, very little abstraction.
· Hard to evolve (in a backward compatible way).
· Unreadable without driver abstraction.
Cassandra has often been regarded as hard to develop against.
It doesn't have to be that way!
3/20
4. Quick historical notes
· CQL1 first introduced in Cassandra 0.8, became CQL2 in Cassandra 1.0
· "These aren't the CQL you are looking for"
· CQL3 (CQL for short thereafter) introduced in Cassandra 1.2
· Semantically, CQL1/CQL2 are closer to the Thrift API than to CQL3.
· CQL3 is the version that's here to stay: no plan for a CQL4 any time soon.
4/20
6. The Cassandra Query Language
· Syntactically, a subset of SQL (with a few extensions)
CET TBEues(
RAE AL sr
ue_dui,
sri ud
nm tx,
ae et
pswr tx,
asod et
ealtx,
mi et
pcuepoiebo,
itr_rfl lb
PIAYKY(sri)
RMR E ue_d
)
· INSERT and UPDATE are both upserts
· No joins, no sub-queries, no aggregation, ...
· Denormalization is the norm: do the work at write time, not read time
6/20
CL
Q
7. Denormalization: Cassandra modeling 101
Efficient queries in Cassandra are based on 2 principles:
· the data queried is collocated on one replica set
· the data queried is collocated on disk on those replicas
Denormalization is the technique that allows to achieve this in practice.
But this means CQL exposes:
· how to collocate data on the same replica set
· how to collocate data on disk (for a given replica)
7/20
8. This is done in CQL through the primary key
CET TBEibxs(
RAE AL noe
CL
Q
ue_dui,
sri ud
eali tmui,
mi_d ieud
sne tx,
edr et
rcpet sttx>
eiins e<et,
sbettx,
ujc et
i_edboen
sra ola,
P I A Y K Y (u e _ d, e a l i
RMR E
sri
m i _ d)
)
CQL distinguishes 2 sub-parts in the PRIMARY KEY:
partition key: decides the node on which the data is stored
clustering columns: within the same partition key, (CQL3) rows are
physically ordered following the clustering columns
·
·
This is important, because CQL only allow queries for which an explicit index
exists:
- Gtls 5 eal i ue 5b2-b ibx
- e at 0 mis n sr 1-3a8 no
8/20
SLC *FO ibxsWEEue_d5b2-b ODRB eali DS LMT5;
EET
RM noe HR sri=1-3a8 RE Y mi_d EC II 0
CL
Q
9. CQL main features
· Collections (set, map and list)
· Secondary indexes
· Convenience functions (timeuuid, type conversions, ...)
· ...
For more details:
· http://cassandra.apache.org/doc/cql3/CQL.html
· http://www.datastax.com/documentation/cql/3.1/webhelp/index.html
9/20
11. New in Cassandra 2.0
Lightweight transactions:
ISR IT ts (d nm)VLE (2 'o' I NTEIT;
NET NO et i, ae AUS 4, Tm) F O XSS
CL
Q
UDT ts STpswr=nwas WEEi=2I pswr=odas;
PAE et E asod'eps' HR d4 F asod'lps'
Triggers:
CET TIGRmTigrO ts UIG'ytigrCas;
RAE RGE yrge N et SN m.rge.ls'
CL
Q
ALTER DROP:
CET TBEts ( itPIAYKY po1it po2tx,po3fot;
RAE AL et k n RMR E, rp n, rp et rp la)
CL
Q
ATRTBEts DO po3
LE AL et RP rp;
Preparing TIMESTAMP, TTL and LIMIT:
SLC *FO mTbeLMT?
EET
RM yal II ;
UDT mTbeUIGTL?STv=2WEEk='o'
PAE yal SN T
E
HR
fo;
11/20
CL
Q
12. New in Cassandra 2.0
Conditional DDL:
CET TBEI NTEIT ts ( itPIAYKY;
RAE AL F O XSS et k n RMR E)
CL
Q
DO KYPC I EIT k;
RP ESAE F XSS s
Secondary indexes everywhere (almost):
CET TBEtmln (
RAE AL ieie
CL
Q
eeti ui,
vn_d ud
cetda tmui,
rae_t ieud
cnetbo,
otn lb
PIAYKY(vn_d cetda)
RMR E eeti, rae_t
)
;
CET IDXO tmln (rae_t;
RAE NE N ieie cetda)
SELECT aliases:
SLC eeti,
EET vn_d
dtO(rae_t A ceto_ae
aefcetda) S raindt,
12/20
FO tmln;
RM ieie
CL
Q
13. Coming in Cassandra 2.0.2
Named bind variables:
L
S L C * F O t m l n W E E c e t d a > : l w A D c e t d a < : h g A D k y = Q;
EET
RM ieie HR rae_t
t o N r a e _ t = t i h N e Ck
:
Prepared IN:
SLC *FO uesWEEue_dI ?
EET
RM sr HR sri N ;
CL
Q
Limited SELECT DISTINCT:
CET TBEts (
RAE AL et
eeti it
vn_d n,
cetda tmsap
rae_t ietm,
cnetbo,
otn lb
PIAYKY(vn_d cetda)
RMR E eeti, rae_t
)
;
SLC DSIC eeti FO ts;
EET ITNT vn_d RM et
13/20
CL
Q
15. Native protocol
· Binary transport protocol for CQL
· Query execution, prepared statements, authentication, compression, ...
· Asynchronous (allows multiple concurrent queries per connection)
· Server notifications (Only generic cluster events currently)
· Existing drivers for Java, C#, Python, C++, Golang, ...
Example usage of the Java driver (https://github.com/datastax/java-driver):
Cutrcutr=Cutrbidr)adotcPit"2...".ul(;
lse lse
lse.ule(.dCnaton(17001)bid)
Ssinssin=cutrcnet"yesae)
eso eso
lse.onc(mKypc";
fr(o rw:ssineeue"EET*FO mTbe)
o Rw o
eso.xct(SLC
RM yal")
/ D smtig..
/ o oehn .
15/20
JV
AA
16. New in Cassandra 2.0: native protocol 2
Cursors:
fr(o rw:ssineeue"EET*FO mTbe)
o Rw o
eso.xct(SLC
RM yal")
JV
AA
/ D smtig..
/ o oehn .
Batching prepared statements:
P e a e S a e e t p = s s i n p e a e " N E T I T m T b e ( 1 p ) V L E ( , ?A ;
rprdttmn s
eso.rpr(ISR NO yal p, 1 AUS ? J V
))
"A
Bthttmn b =nwBthttmn(;
acSaeet s
e acSaeet)
b.d(sbn(,"1);
sadp.id0 v")
b.d(sbn(,"2);
sadp.id1 v")
b.d(sbn(,"3);
sadp.id2 v")
ssineeueb)
eso.xct(s;
One-shot prepare and execute:
s s i n e e u e " N E T I T u e s ( d p o o V L E ( , ? " s m I , p o o y eJ V
e s o . x c t ( I S R N O s r i , h t ) A U S ? ) , o e d h t B t sA A
)
;
SASL for authentication
16/20
18. CQL: some ideas
· Storage engine optimizations for CQL
· Secondary index for collections
· Server side functions
· User defined types
· ...
18/20
19. User defined types
CET TP ades(
RAE YE drs
sre tx,
tet et
zpcd it
i_oe n,
saetx,
tt et
poe sttx>
hns e<et
)
;
CET TBEues(
RAE AL sr
i ui PIAYKY
d ud RMR E,
nm tx,
ae et
adessmptx,ades
drse a<et drs>
)
;
ISR IT ues(d nm)VLE (3-a71 "yvi Lben";
NET NO sr i, ae AUS 244-6, Slan erse)
UDT uesSTadess"ok]={
PAE sr E drse[wr"
sre:'7 Mrnr Iln Bv #1'
tet 77 aies sad ld 50,
zpcd:944
i_oe 40,
sae 'A,
tt: C'
19/20
poe:{603960 }
hns
5-8-00
}WEEi =244-6;
HR d
3-a71
CL
Q