State of Cassandra 2011Jonathan EllisApache ChairCTODataStax
Job Trends from Indeed.com                             2	  
Customer and Verticals•    Financial•    Social Media•    Advertising•    Entertainment•    Energy•    E-tail•    Health c...
Why?Why?	             4	             4	  
5	  
Why Cassandra?Why	  Cassandra?	                          6	                          6	  
Better technology•    Multi-master, multi-DC•    Linearly scalable•    Larger-than-memory datasets•    Best-in-class perfo...
Tunable Consistency WRITE                               READ Level                Level ANY ONE                  ONE LOCAL...
Generalizes Easily to Multi-DC                                 9	  
0.7•    CREATE COLUMN FAMILY•    Expiring columns (TTL)•    Secondary (column) indexes•    Efficient streaming            ...
0.8•    CQL•    Counters•    Automatic memtable tuning•    New bulk load interface                                 1      ...
A performance retrospective                              1                              2	  
October 8, 2011Road to 1.0                  13	  
Theme: polish•    Repair•    Compaction•    Optimize reads for update-heavy workloads•    CQL 1.1                         ...
Repair•  Consistency is checked per-ColumnFamily but data   is transferred per-Keyspace•  Merkle trees requests are sent e...
Compression•  Rows-per-block or blocks-per-row                                      1                                     ...
Read Performance: Compaction                               1                               7	  
Level-based Compaction•  SSTables are non-overlapping within a level•  Bounds the number that can contain a given row     ...
Read performance: maxtimestamp•  Sort sstables by maximum (client-provided)   timestamp•  Only merge sstables until we hav...
CQLcqlsh> SELECT * FROM users WHERE state=UT AND birth_date > 1970;!
        KEY | birth_date |         full_name | state ...
CQL 1.1•    ALTER•    Counter support•    TTL support•    Compound columns•    Prepared statements                        ...
Post-1.0•  Ease of use• Ease of use• Ease of use                 2                 2	  
Post-1.0 features•    “Native” CQL transport•    Triggers•    Entity groups•    Smarter range queries                     ...
Brisk•  Analytics for your   realtime data   without ETL•  Widens scope of   Cassandra’s   applicability•  Also: Solandra ...
QuestionsQues/ons?	                  25	                  25	  
State of Cassandra, 2011
Upcoming SlideShare
Loading in...5
×

State of Cassandra, 2011

982,798

Published on

Published in: Technology, Business
1 Comment
11 Likes
Statistics
Notes
No Downloads
Views
Total Views
982,798
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
361
Comments
1
Likes
11
Embeds 0
No embeds

No notes for slide

State of Cassandra, 2011

  1. 1. State of Cassandra 2011Jonathan EllisApache ChairCTODataStax
  2. 2. Job Trends from Indeed.com 2  
  3. 3. Customer and Verticals•  Financial•  Social Media•  Advertising•  Entertainment•  Energy•  E-tail•  Health care•  Government 3  
  4. 4. Why?Why?   4   4  
  5. 5. 5  
  6. 6. Why Cassandra?Why  Cassandra?   6   6  
  7. 7. Better technology•  Multi-master, multi-DC•  Linearly scalable•  Larger-than-memory datasets•  Best-in-class performance (not just writes!)•  Fully durable•  Integrated caching•  Tuneable consistency 7  
  8. 8. Tunable Consistency WRITE READ Level Level ANY ONE ONE LOCAL_QUORUM LOCAL_QUORUM QUORUM QUORUM ALL ALL
  9. 9. Generalizes Easily to Multi-DC 9  
  10. 10. 0.7•  CREATE COLUMN FAMILY•  Expiring columns (TTL)•  Secondary (column) indexes•  Efficient streaming 1 0  
  11. 11. 0.8•  CQL•  Counters•  Automatic memtable tuning•  New bulk load interface 1 1  
  12. 12. A performance retrospective 1 2  
  13. 13. October 8, 2011Road to 1.0 13  
  14. 14. Theme: polish•  Repair•  Compaction•  Optimize reads for update-heavy workloads•  CQL 1.1 1 4  
  15. 15. Repair•  Consistency is checked per-ColumnFamily but data is transferred per-Keyspace•  Merkle trees requests are sent en masse, but may not execute start at the same time 1 5  
  16. 16. Compression•  Rows-per-block or blocks-per-row 1 6  
  17. 17. Read Performance: Compaction 1 7  
  18. 18. Level-based Compaction•  SSTables are non-overlapping within a level•  Bounds the number that can contain a given row L2: 1000 MB L1: 100 MB L0: newly flushed 1 8  
  19. 19. Read performance: maxtimestamp•  Sort sstables by maximum (client-provided) timestamp•  Only merge sstables until we have the columns request•  Allows pre-merging highly fragmented rows without waiting for compaction 1 9  
  20. 20. CQLcqlsh> SELECT * FROM users WHERE state=UT AND birth_date > 1970;!
        KEY | birth_date |         full_name | state |
 bsanderson |       1975 | Brandon Sanderson |    UT |   2 0  
  21. 21. CQL 1.1•  ALTER•  Counter support•  TTL support•  Compound columns•  Prepared statements 2 1  
  22. 22. Post-1.0•  Ease of use• Ease of use• Ease of use 2 2  
  23. 23. Post-1.0 features•  “Native” CQL transport•  Triggers•  Entity groups•  Smarter range queries 2 3  
  24. 24. Brisk•  Analytics for your realtime data without ETL•  Widens scope of Cassandra’s applicability•  Also: Solandra 2 4  
  25. 25. QuestionsQues/ons?   25   25  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×