Your SlideShare is downloading. ×
0
Cassandra concepts,                                patterns and anti-                                         patterns    ...
Agenda              • Choosing NoSQL              • Cassandra concepts                (Dynamo and Big Table)              ...
Choosing NoSQL...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
1. Find data store that doesn’t use         SQL      2. Anything      3. Cram all the things into it      4. Triumphantly ...
“NoSQL DBs trade off        traditional features to better        support new and emerging use        cases”            ht...
More widely used, tested and            documented software..            (MySQL first OS release 1998)            .. for a...
Ad-hoc querying..            (SQL join, group by, having, order)            .. for a rich data model with            limit...
What do we get in return?Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Proven horizontal            scalability            Cassandra scales reads and            writes linearly as new nodes    ...
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.htmlCassandra concepts, patterns and anti-patter...
High availability            Cassandra is fault-resistant            with tunable consistency levelsCassandra concepts, pa...
Low latency, solid            performance            Cassandra has very good write            performanceCassandra concept...
* Add pinch of salt                  http://blog.cubrid.org/dev-platform/nosql-benchmarking/Cassandra concepts, patterns a...
Operational simplicity            Homogenous cluster, no            “master” node, no SPOFCassandra concepts, patterns and...
Rich data model            Cassandra is more than simple            key-value – columns,            composites, counters, ...
Choosing NoSQL...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
“they say … I can’t decide between            this project and this project even            though they look nothing like ...
Or you haven’t learned         enough about them..Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
•       What tradeoffs are you making?         •       How is it designed?         •       What algorithms does it use?   ...
Concepts...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Amazon Dynamo + Google Big                  Table      Consistent hashing                                                 ...
tokens are                                                          1              integers from                          ...
1                                  6                                  2                                                   ...
1                                                                          replication                                    ...
Consistency Level (CL)     How many replicas must respond to            declare success?Cassandra concepts, patterns and a...
For read operations            Level                            Description            ONE                              1s...
For write operations            Level                           Description            ANY                             One...
RF = 3                                                          1          CL = Quorum                                  6 ...
Hinted Handoff         A hint is written to the coordinator           node when a replica is down            http://wiki.a...
node offline                        RF = 3                                                          1          CL = Quorum...
Read Repair     Background digest query on-read to     find and update out-of-date replicas*            http://wiki.apache...
RF = 3                                                          1              CL = One                                  6...
Big Table...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
•       Sparse column based data model         •       SSTable disk storage         •       Append-only commit log        ...
+ timestamp                                          Name                                          Value                  ...
we can have millions           of columns *                                          Name                  Name           ...
Row                                          Name                  Name    Name            Row Key                        ...
Column Family            Row Key                     Column                 Column           Column            Row Key    ...
Write path                                                  buffer writes and                                             ...
Sorted data written to disk in                            blocks              Each “query” can be answered                ...
Patterns and                          anti-patterns...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern                   Storing entities as                  individual columns                    under one rowCassandr...
Pattern                          one row per user           row:                           USERID1234           name:     ...
Anti-pattern               Storing whole entity                as single column                       blobCassandra concep...
Anti-pattern                          one row per user           row:                           USERID1234           data:...
Pattern                  Mutate just the                changes to entities,                  make use of C*              ...
Pattern           $userCf->insert(               "USER1234",               array("job" => "Cruft")               );       ...
Anti-pattern                 Lock, read, updateCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Pattern                     Don’t overwrite                   anything; store as                    time series dataCassan...
Pattern                          one row per user; many columns (wide                          row)           row:        ...
Pattern                We can store all              sorts of stuff as time                     seriesCassandra concepts, ...
Anti-pattern                     Order Preserving                     Paritioner (OPP)           http://ria101.wordpress.c...
Pattern              Distributed countersCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Anti-pattern                        Super Columns                                (a trap for the unwary)           http://...
In conclusion...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
Cassandra is founded on   sound design principlesCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
The data model is                 incredibly powerfulCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
CQL and a new breed          of clients are making             it easier to useCassandra concepts, patterns and anti-patte...
Lots of tools and          integrations exist to         expand the feature setCassandra concepts, patterns and anti-patte...
There is a strong      community and multiple        companies offering       professional supportCassandra concepts, patt...
Thanks                                                                     looking for a job?         Learn more about Cas...
Extending         functionality         Search via Apache Solr and DataStax Enterprise         http://www.datastax.com/tec...
Upcoming SlideShare
Loading in...5
×

Cassandra concepts, patterns and anti-patterns

10,593

Published on

An introduction to the fundamental concepts behind Apache Cassandra. This talk explains the engineering principles that make Cassandra such an attractive choice for building highly resilient and available systems and then goes on to explain how to use it - covering basic data modelling patterns and anti-patterns.

0 Comments
25 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,593
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
454
Comments
0
Likes
25
Embeds 0
No embeds

No notes for slide
  • How should we go about choosing a NoSQL solution?This is the way that NoSQL is often approachedA light-hearted take on both how people approach NoSQL and to some extent the tools themselves
  • A good way of choosing NoSQL is by considering the tradeoffs
  • So we are going to be considering tradeoffs when we make our choice; but how do we know these tradeoffs are worth making?
  • Ben Black suggests that a tell tale sign that you don’t _need_ NoSQL is when you cannot decide between projects
  • I would argue that you probably cannot decide because you haven’t learned enough about the solutions and how they will fit your needs
  • Learn about how you would model your application; learn about the key design decisions of the project itself; learn about the algorithms it uses.
  • Cassandra is based on the distribution model of Amazon Dynamo with the data model of Google Big Table. We drop Vector Clocks from Dyanamo in favour of column-based “last write wins” conflict resolution from Big Table.
  • DHT. The usual way of visualising is to draw a ring where we start at hash 0 and move round to our largest number (here 2^127).We pick “tokens” within this space (here we have 6 nodes and hence 6 tokens).
  • When we write or read data, we calculate a hash of the row key to decide which node should be responsible for the data.We can connect to _any_ node in the cluster to issue our command; this will act as a coordinator node and store/get the data for us.
  • Replicas are picked by traversing round the tokens.
  • Here with RF=3 we need two of our 3 replicas to respond to consider the operation a success.When writing, all 3 replicas will still receive the write immediately; we just won’t _require_ them to respond to consider an operation successful (for CL < ALL).
  • Pre 1.0, hints were written to a live replica. Now written to the coordinator. The hint tells the coordinator to update the unreachable replica when it comes back online.
  • Reading at CL:ONE, we get our result from #1. There is then a % chance that we’ll query #2 and #3 for their value and then update any out-of-date replicas.
  • I work for Hailo, these are patterns I find myself using frequently and anti-patterns I try to avoid.
  • Transcript of "Cassandra concepts, patterns and anti-patterns"

    1. 1. Cassandra concepts, patterns and anti- patterns Dave Gardner @davegardnerisme ApacheCon EU 2012Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    2. 2. Agenda • Choosing NoSQL • Cassandra concepts (Dynamo and Big Table) • Patterns and anti-patterns of useCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    3. 3. Choosing NoSQL...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    4. 4. 1. Find data store that doesn’t use SQL 2. Anything 3. Cram all the things into it 4. Triumphantly blog this success 5. Complain a month later when it bursts into flamesCassandrahttp://www.slideshare.net/rbranson/how-do-i-cassandra/4 concepts, patterns and anti-patterns - ApacheCon EU 2012
    5. 5. “NoSQL DBs trade off traditional features to better support new and emerging use cases” http://www.slideshare.net/argv0/riak-use-cases-dissecting-the- solutions-to-hard-problemsCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    6. 6. More widely used, tested and documented software.. (MySQL first OS release 1998) .. for a relatively immature product (Cassandra first open-sourced in 2008)Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    7. 7. Ad-hoc querying.. (SQL join, group by, having, order) .. for a rich data model with limited ad-hoc querying ability (Cassandra makes you denormalise)Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    8. 8. What do we get in return?Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    9. 9. Proven horizontal scalability Cassandra scales reads and writes linearly as new nodes are addedCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    10. 10. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.htmlCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    11. 11. High availability Cassandra is fault-resistant with tunable consistency levelsCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    12. 12. Low latency, solid performance Cassandra has very good write performanceCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    13. 13. * Add pinch of salt http://blog.cubrid.org/dev-platform/nosql-benchmarking/Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    14. 14. Operational simplicity Homogenous cluster, no “master” node, no SPOFCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    15. 15. Rich data model Cassandra is more than simple key-value – columns, composites, counters, secondary indexesCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    16. 16. Choosing NoSQL...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    17. 17. “they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.” http://nosqltapes.com/video/benjamin-black-on-nosql-cloud- computing-and-fast_ip (at 30:15)Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    18. 18. Or you haven’t learned enough about them..Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    19. 19. • What tradeoffs are you making? • How is it designed? • What algorithms does it use? • Are the fundamental design decisions sane? http://www.alberton.info/nosql_databases_what_when_why_phpuk2 011.htmlCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    20. 20. Concepts...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    21. 21. Amazon Dynamo + Google Big Table Consistent hashing Columnar Vector clocks * SSTable storage Gossip protocol Append-only Hinted handoff Memtable Read repair Compaction http://www.allthingsdistributed.com/fhttp://labs.google.com/papers/bi iles/amazon-dynamo-sosp2007.pdf gtable-osdi06.pdf * not in CassandraCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    22. 22. tokens are 1 integers from 0 to 2127 6 2 5 3 Distributed Hash Table 4 (DHT) Clien tCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    23. 23. 1 6 2 consistent Coordinator hashing node 5 3 Clien 4 t Clien tCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    24. 24. 1 replication factor (RF) 3 6 2 coordinator node 5 3 Clien 4 t Clien tCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    25. 25. Consistency Level (CL) How many replicas must respond to declare success?Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    26. 26. For read operations Level Description ONE 1st Response QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#ReadCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    27. 27. For write operations Level Description ANY One node, including hinted handoff ONE One node QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#WriteCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    28. 28. RF = 3 1 CL = Quorum 6 2 coordinator node 5 3 Clien 4 t Clien tCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    29. 29. Hinted Handoff A hint is written to the coordinator node when a replica is down http://wiki.apache.org/cassandra/HintedHandoffCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    30. 30. node offline RF = 3 1 CL = Quorum 6 2 coordinator node 5 3 Clien hint 4 t Clien tCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    31. 31. Read Repair Background digest query on-read to find and update out-of-date replicas* http://wiki.apache.org/cassandra/ReadRepair * carried out in the background unless CL:ALLCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    32. 32. RF = 3 1 CL = One 6 2 coordinator node 5 3 background digest Clien 4 query, then update t Clien out-of-date replicas tCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    33. 33. Big Table...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    34. 34. • Sparse column based data model • SSTable disk storage • Append-only commit log • Memtable (buffer and sort) • Immutable SSTable files • Compaction http://research.google.com/archive/bigtable-osdi06.pdf http://www.slideshare.net/geminimobile/bigtable-4820829Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    35. 35. + timestamp Name Value Column Timestamp used for conflict resolution (last write wins)Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    36. 36. we can have millions of columns * Name Name Name Value Value Value Column Column Column * theoretically up to 2 billionCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    37. 37. Row Name Name Name Row Key Value Value Value Column Column ColumnCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    38. 38. Column Family Row Key Column Column Column Row Key Column Column Column Row Key Column Column Column we can have billions of rowsCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    39. 39. Write path buffer writes and sort data Write Memtable flush on time or size trigger Memory Disk Commit SSTable SSTable Log SSTable SSTable immutableCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    40. 40. Sorted data written to disk in blocks Each “query” can be answered from a single slice of disk Therefore start from your queries and work backwardsCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    41. 41. Patterns and anti-patterns...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    42. 42. Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    43. 43. Pattern Storing entities as individual columns under one rowCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    44. 44. Pattern one row per user row: USERID1234 name: Dave email: dave@cruft.co job: Developer we can use C* secondary indexes to fetch all users with job=developerCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    45. 45. Anti-pattern Storing whole entity as single column blobCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    46. 46. Anti-pattern one row per user row: USERID1234 data: {"name":"Dave", "email":"dave@cruft.co", "job":"Developer"} now we can’t use secondary indexes nor easily update safelyCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    47. 47. Pattern Mutate just the changes to entities, make use of C* conflict resolutionCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    48. 48. Pattern $userCf->insert( "USER1234", array("job" => "Cruft") ); we only update the “job” column, avoiding any race conditions on reading all properties and then writing all, having only updated oneCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    49. 49. Anti-pattern Lock, read, updateCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    50. 50. Pattern Don’t overwrite anything; store as time series dataCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    51. 51. Pattern one row per user; many columns (wide row) row: USERID1234 a384cff0-26c1-11e2-81c1-0800200c9a66 {"action":"create", "name":"Dave"} 10dc4c40-26c2-11e2-81c1-0800200c9a66 {"action":"update", "name":"foo"} column name is a type 1 UUID (time based) http://www.famkruithof.net/guid-uuid-timebased.htmlCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    52. 52. Pattern We can store all sorts of stuff as time seriesCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    53. 53. Anti-pattern Order Preserving Paritioner (OPP) http://ria101.wordpress.com/2010/02/22/cassandra- randompartitioner-vs-orderpreservingpartitioner/Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    54. 54. Pattern Distributed countersCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    55. 55. Anti-pattern Super Columns (a trap for the unwary) http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap- for-the-unwary/Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    56. 56. In conclusion...Cassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    57. 57. Cassandra is founded on sound design principlesCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    58. 58. The data model is incredibly powerfulCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    59. 59. CQL and a new breed of clients are making it easier to useCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    60. 60. Lots of tools and integrations exist to expand the feature setCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    61. 61. There is a strong community and multiple companies offering professional supportCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    62. 62. Thanks looking for a job? Learn more about Cassandra (if you’re ever in London) meetup.com/Cassandra-London Learn more about the fundamentals http://nosqlsummer.org/ Watch videos from Cassandra SF 2011 http://www.datastax.com/events/cassandrasf2011/presentation sCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    63. 63. Extending functionality Search via Apache Solr and DataStax Enterprise http://www.datastax.com/technologies/solr Batch processing via Apache Hadoop and DataStax Enterprise http://www.datastax.com/technologies/hadoop Real-time analytics via Acunu Reflex http://www.acunu.com/acunu-analytics.htmlCassandra concepts, patterns and anti-patterns - ApacheCon EU 2012
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×