2. A short CQL primer
New in Cassandra 2.0
Native protocol
What's next?
2/20
3. A better API for Cassandra
Thrift is not satisfactory:
· Not user friendly, hard to use.
· Low level, very little abstraction.
· Hard to evolve (in a backward compatible way).
· Unreadable without driver abstraction.
Cassandra has often been regarded as hard to develop against.
It doesn't have to be that way!
3/20
4. Quick historical notes
· CQL1 first introduced in Cassandra 0.8, became CQL2 in Cassandra 1.0
· "These aren't the CQL you are looking for"
· CQL3 (CQL for short thereafter) introduced in Cassandra 1.2
· Semantically, CQL1/CQL2 are closer to the Thrift API than to CQL3.
· CQL3 is the version that's here to stay: no plan for a CQL4 any time soon.
4/20
6. The Cassandra Query Language
· Syntactically, a subset of SQL (with a few extensions)
CET TBEues(
RAE AL sr
ue_dui,
sri ud
nm tx,
ae et
pswr tx,
asod et
ealtx,
mi et
pcuepoiebo,
itr_rfl lb
PIAYKY(sri)
RMR E ue_d
)
· INSERT and UPDATE are both upserts
· No joins, no sub-queries, no aggregation, ...
· Denormalization is the norm: do the work at write time, not read time
6/20
CL
Q
7. Denormalization: Cassandra modeling 101
Efficient queries in Cassandra are based on 2 principles:
· the data queried is collocated on one replica set
· the data queried is collocated on disk on those replicas
Denormalization is the technique that allows to achieve this in practice.
But this means CQL exposes:
· how to collocate data on the same replica set
· how to collocate data on disk (for a given replica)
7/20
8. This is done in CQL through the primary key
CET TBEibxs(
RAE AL noe
CL
Q
ue_dui,
sri ud
eali tmui,
mi_d ieud
sne tx,
edr et
rcpet sttx>
eiins e<et,
sbettx,
ujc et
i_edboen
sra ola,
P I A Y K Y (u e _ d, e a l i
RMR E
sri
m i _ d)
)
CQL distinguishes 2 sub-parts in the PRIMARY KEY:
partition key: decides the node on which the data is stored
clustering columns: within the same partition key, (CQL3) rows are
physically ordered following the clustering columns
·
·
This is important, because CQL only allow queries for which an explicit index
exists:
- Gtls 5 eal i ue 5b2-b ibx
- e at 0 mis n sr 1-3a8 no
8/20
SLC *FO ibxsWEEue_d5b2-b ODRB eali DS LMT5;
EET
RM noe HR sri=1-3a8 RE Y mi_d EC II 0
CL
Q
9. CQL main features
· Collections (set, map and list)
· Secondary indexes
· Convenience functions (timeuuid, type conversions, ...)
· ...
For more details:
· http://cassandra.apache.org/doc/cql3/CQL.html
· http://www.datastax.com/documentation/cql/3.1/webhelp/index.html
9/20
11. New in Cassandra 2.0
Lightweight transactions:
ISR IT ts (d nm)VLE (2 'o' I NTEIT;
NET NO et i, ae AUS 4, Tm) F O XSS
CL
Q
UDT ts STpswr=nwas WEEi=2I pswr=odas;
PAE et E asod'eps' HR d4 F asod'lps'
Triggers:
CET TIGRmTigrO ts UIG'ytigrCas;
RAE RGE yrge N et SN m.rge.ls'
CL
Q
ALTER DROP:
CET TBEts ( itPIAYKY po1it po2tx,po3fot;
RAE AL et k n RMR E, rp n, rp et rp la)
CL
Q
ATRTBEts DO po3
LE AL et RP rp;
Preparing TIMESTAMP, TTL and LIMIT:
SLC *FO mTbeLMT?
EET
RM yal II ;
UDT mTbeUIGTL?STv=2WEEk='o'
PAE yal SN T
E
HR
fo;
11/20
CL
Q
12. New in Cassandra 2.0
Conditional DDL:
CET TBEI NTEIT ts ( itPIAYKY;
RAE AL F O XSS et k n RMR E)
CL
Q
DO KYPC I EIT k;
RP ESAE F XSS s
Secondary indexes everywhere (almost):
CET TBEtmln (
RAE AL ieie
CL
Q
eeti ui,
vn_d ud
cetda tmui,
rae_t ieud
cnetbo,
otn lb
PIAYKY(vn_d cetda)
RMR E eeti, rae_t
)
;
CET IDXO tmln (rae_t;
RAE NE N ieie cetda)
SELECT aliases:
SLC eeti,
EET vn_d
dtO(rae_t A ceto_ae
aefcetda) S raindt,
12/20
FO tmln;
RM ieie
CL
Q
13. Coming in Cassandra 2.0.2
Named bind variables:
L
S L C * F O t m l n W E E c e t d a > : l w A D c e t d a < : h g A D k y = Q;
EET
RM ieie HR rae_t
t o N r a e _ t = t i h N e Ck
:
Prepared IN:
SLC *FO uesWEEue_dI ?
EET
RM sr HR sri N ;
CL
Q
Limited SELECT DISTINCT:
CET TBEts (
RAE AL et
eeti it
vn_d n,
cetda tmsap
rae_t ietm,
cnetbo,
otn lb
PIAYKY(vn_d cetda)
RMR E eeti, rae_t
)
;
SLC DSIC eeti FO ts;
EET ITNT vn_d RM et
13/20
CL
Q
15. Native protocol
· Binary transport protocol for CQL
· Query execution, prepared statements, authentication, compression, ...
· Asynchronous (allows multiple concurrent queries per connection)
· Server notifications (Only generic cluster events currently)
· Existing drivers for Java, C#, Python, C++, Golang, ...
Example usage of the Java driver (https://github.com/datastax/java-driver):
Cutrcutr=Cutrbidr)adotcPit"2...".ul(;
lse lse
lse.ule(.dCnaton(17001)bid)
Ssinssin=cutrcnet"yesae)
eso eso
lse.onc(mKypc";
fr(o rw:ssineeue"EET*FO mTbe)
o Rw o
eso.xct(SLC
RM yal")
/ D smtig..
/ o oehn .
15/20
JV
AA
16. New in Cassandra 2.0: native protocol 2
Cursors:
fr(o rw:ssineeue"EET*FO mTbe)
o Rw o
eso.xct(SLC
RM yal")
JV
AA
/ D smtig..
/ o oehn .
Batching prepared statements:
P e a e S a e e t p = s s i n p e a e " N E T I T m T b e ( 1 p ) V L E ( , ?A ;
rprdttmn s
eso.rpr(ISR NO yal p, 1 AUS ? J V
))
"A
Bthttmn b =nwBthttmn(;
acSaeet s
e acSaeet)
b.d(sbn(,"1);
sadp.id0 v")
b.d(sbn(,"2);
sadp.id1 v")
b.d(sbn(,"3);
sadp.id2 v")
ssineeueb)
eso.xct(s;
One-shot prepare and execute:
s s i n e e u e " N E T I T u e s ( d p o o V L E ( , ? " s m I , p o o y eJ V
e s o . x c t ( I S R N O s r i , h t ) A U S ? ) , o e d h t B t sA A
)
;
SASL for authentication
16/20
18. CQL: some ideas
· Storage engine optimizations for CQL
· Secondary index for collections
· Server side functions
· User defined types
· ...
18/20
19. User defined types
CET TP ades(
RAE YE drs
sre tx,
tet et
zpcd it
i_oe n,
saetx,
tt et
poe sttx>
hns e<et
)
;
CET TBEues(
RAE AL sr
i ui PIAYKY
d ud RMR E,
nm tx,
ae et
adessmptx,ades
drse a<et drs>
)
;
ISR IT ues(d nm)VLE (3-a71 "yvi Lben";
NET NO sr i, ae AUS 244-6, Slan erse)
UDT uesSTadess"ok]={
PAE sr E drse[wr"
sre:'7 Mrnr Iln Bv #1'
tet 77 aies sad ld 50,
zpcd:944
i_oe 40,
sae 'A,
tt: C'
19/20
poe:{603960 }
hns
5-8-00
}WEEi =244-6;
HR d
3-a71
CL
Q