State of Cassandra 2011
Jonathan Ellis
Apache Chair
CTO
DataStax
Job Trends from Indeed.com




                             2	
 Ā 
Customer and Verticals
•    Financial
•    Social Media
•    Advertising
•    Entertainment
•    Energy
•    E-tail
•    Health care
•    Government



                         3	
 Ā 
Why?
Why?	
 Ā 
           4	
 Ā 
           4	
 Ā 
5	
 Ā 
Why Cassandra?
Why	
 Ā Cassandra?	
 Ā 
                        6	
 Ā 
                        6	
 Ā 
Better technology
•    Multi-master, multi-DC
•    Linearly scalable
•    Larger-than-memory datasets
•    Best-in-class performance (not just writes!)
•    Fully durable
•    Integrated caching
•    Tuneable consistency




                                                    7	
 Ā 
Tunable Consistency
 WRITE                               READ
 Level                Level

 ANY

 ONE                  ONE

 LOCAL_QUORUM         LOCAL_QUORUM

 QUORUM               QUORUM

 ALL                  ALL
Generalizes Easily to Multi-DC




                                 9	
 Ā 
0.7
•    CREATE COLUMN FAMILY
•    Expiring columns (TTL)
•    Secondary (column) indexes
•    Efficient streaming




                                  1
                                  0	
 Ā 
0.8
•    CQL
•    Counters
•    Automatic memtable tuning
•    New bulk load interface




                                 1
                                 1	
 Ā 
A performance retrospective




                              1
                              2	
 Ā 
October 8, 2011

Road to 1.0

                  13	
 Ā 
Theme: polish
•    Repair
•    Compaction
•    Optimize reads for update-heavy workloads
•    CQL 1.1




                                                 1
                                                 4	
 Ā 
Repair
•  Consistency is checked per-ColumnFamily but data
   is transferred per-Keyspace
•  Merkle trees requests are sent en masse, but may
   not execute start at the same time




                                                      1
                                                      5	
 Ā 
Compression
•  Rows-per-block or blocks-per-row




                                      1
                                      6	
 Ā 
Read Performance: Compaction




                               1
                               7	
 Ā 
Level-based Compaction
•  SSTables are non-overlapping within a level
•  Bounds the number that can contain a given row




                                  L2: 1000 MB

                                  L1: 100 MB


                                  L0: newly flushed
                                                      1
                                                      8	
 Ā 
Read performance: maxtimestamp
•  Sort sstables by maximum (client-provided)
   timestamp
•  Only merge sstables until we have the columns
   request
•  Allows pre-merging highly fragmented rows without
   waiting for compaction




                                                       1
                                                       9	
 Ā 
CQL

cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970;!


Ā Ā Ā Ā Ā Ā Ā Ā KEYĀ |Ā birth_dateĀ |Ā Ā Ā Ā Ā Ā Ā Ā Ā full_nameĀ |Ā stateĀ |

Ā bsandersonĀ |Ā Ā Ā Ā Ā Ā Ā 1975Ā |Ā BrandonĀ SandersonĀ |Ā Ā Ā Ā UTĀ |
	
 Ā 




                                                               2
                                                               0	
 Ā 
CQL 1.1
•    ALTER
•    Counter support
•    TTL support
•    Compound columns
•    Prepared statements




                           2
                           1	
 Ā 
Post-1.0
•  Ease of use

• Ease of use

• Ease of use




                 2
                 2	
 Ā 
Post-1.0 features
•    ā€œNativeā€ CQL transport
•    Triggers
•    Entity groups
•    Smarter range queries




                              2
                              3	
 Ā 
Brisk
•  Analytics for your
   realtime data
   without ETL
•  Widens scope of
   Cassandra’s
   applicability
•  Also: Solandra




                        2
                        4	
 Ā 
Questions
Ques/ons?	
 Ā 
                25	
 Ā 
                25	
 Ā 
State of Cassandra, 2011

State of Cassandra, 2011