Cassandra overview


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cassandra overview

  1. 1. Overview ofCassandra
  2. 2. Outline● History/motivation● Semi structured data in Cassandra ○ CFs and SuperCFs● Architecture of Cassandra system ○ Distribution of content ○ Replication of content ○ Consistency level ○ Node internals ○ Gossip● Thrift API● Design patterns - denormalization
  3. 3. History/motivation● Initially developed by facebook for Inbox Search ○ in late 2007/early 2008● Designed for ○ node failure - commodity hardware ○ scale - can increase number of nodes easily to accommodate increasing demand ○ fast write access while delivering good read performance● Combination of Bigtable and Dynamo● Was operational for over 2 years ○ Dropped in favour of HBase
  4. 4. History/motivation● Released as open source in July 2008● Apache liked it ○ Became Apache Incubator project in March 2009 ○ Became Apache top level project in Feb 2010● Active project with releases every few months ○ currently on version 1.1 ■ production ready, but still evolving
  5. 5. Why its interesting (in thiscontext)...● Has seen significant growth in last couple of years● Enough deployments to be credible ○ Netflix, Ooyala, Digg, Cisco,● Is scalable and robust enough for big data problems ○ no single point of failure● Complex system ○ perhaps excessively complex today
  6. 6. Cassandra - semistructured data● Column based database ○ has similarities to standard RDBMS● Terminology: ○ Keystore -> database ○ ColumnFamily -> table
  7. 7. Cassandra - semistructured data● No specific schema is required ○ although it is possible to define schema ■ can include typing information for parts of schema to minimize data integrity problems● Rows can have large numbers of columns ○ limit on number of columns is 2B● Column values should not exceed some MB● SuperColumns are columns embedded within columns ○ third level in a map ○ little discussion of SC here
  8. 8. Supercolumns depicted
  9. 9. Cassandra - secondaryindexing● Columns can be indexed ○ so-called secondary indexing ■ row keys form the primary index● Some debate abt the merits of secondary indexing in cassandra ○ secondary indexing is an atomic operation ■ unlike alternative manual indexing approach ○ causes change in thinking regarding NoSQL design ■ very similar to classical RDBMS thinking
  10. 10. Cassandra Architecture● Cluster configuration typical● All nodes peers ○ although there are some seeds which should be more reliable, larger nodes● Peers have common view of tokenspace ○ tokenspace is a ring ■ of size 2^127 ○ peers have responsibility for some part of ring ■ ie some range of tokens within ring● Row key/keyspace mapped to token ○ used to determine which node is responsible for row data
  11. 11. Cassandra - Cluster andTokenspace
  12. 12. Cassandra - DataDistribution● Map from RowKey to token determines data distribution● RandomPartitioner is most important map ○ generates MD5 hash of rowkey ○ distributes data evenly over nodes in cluster ○ highly preferred solution ○ constraint that it is not possible to iterate over rows● OrderedPartitioner ○ generates token based on simply byte mapping of row key ○ most probably results in uneven distribution of data ○ can be used to iterate over rows
  13. 13. Cassandra - DataReplication● Multiple levels of replication supported ○ can support arbitrary level of replication ○ replication factors specified per keyspace● Two replication strategies ○ RackUnaware ■ Make replicas in next n nodes along token ring ○ RackAware ■ Makes one replica in remote data centre ■ Make remaining replicas in next nodes along token ring ● good ring configuration should result in diversity over data centres
  14. 14. Cassandra - ConsistencyLevel● A mechanism to trade off latency with data consistency ○ Write case: ■ Faster response <-> less sure data written properly ○ Read case: ■ Faster response <-> less sure most recent data read● Related to data replication above ○ replication factor determines meaningful levels for consistency level
  15. 15. Cassandra - Consistency Level - WriteLevel BehaviorANY Ensure that the write has been written to at least 1 node, including HintedHandoff recipients.ONE Ensure that the write has been written to at least 1 replicas commit log and memory table before responding to the client.TWO Ensure that the write has been written to at least 2 replicas before responding to the client.THREE Ensure that the write has been written to at least 3 replicas before responding to the client.QUORUM Ensure that the write has been written to N / 2 + 1 replicas before responding to the client.LOCAL_Q Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within the localUORUM datacenter (requires NetworkTopologyStrategy)EACH_QU Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each datacenterORUM (requires NetworkTopologyStrategy)ALL Ensure that the write is written to all N replicas before responding to the client. Any unresponsive replicas will fail the operation.
  16. 16. Cassandra - Consistency Level - ReadLevel BehaviorANY Not supported. You probably want ONE instead.ONE Will return the record returned by the first replica to respond. A consistency check is always done in a background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is calledReadRepair)TWO Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will be checked in the background.THREE Will query 3 replicas and return the record with the most recent timestamp.QUORUM Will query all replicas and return the record with the most recent timestamp once it has at least a majority of replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.LOCAL_Q Returns the record with the most recent timestamp once a majority of replicas within the localUORUM datacenter have replied.EACH_QU Returns the record with the most recent timestamp once a majority of replicas within eachORUM datacenter have replied.ALL Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any unresponsive replicas will fail the operation.
  17. 17. Cassandra - Node Internals● Node comprises ○ commit log ■ list of pending writes ○ memtable ■ data written to system resident in memory ○ SSTables ■ per CF file containing persistent data● Memtable writes when out of space, too many keys or after time period● SSTables comprise of ○ Data - sorted strings ○ Index, Bloom Filter
  18. 18. Cassandra - Node Internals● Compaction occurs from time to time ○ cleans up SSTable ○ removes redundant rows ○ regenerates indexes
  19. 19. Cassandra - Behaviour -Write● Write properties: ○ No reads ○ No seeks ○ Fast! ○ Atomic within CF ○ Always writable
  20. 20. Cassandra - Behaviour -Read● Read Path: ○ Any node ○ Partitioner ○ Wait for R responses ○ Wait for N-R responses in background and perform read repair● Read Properties: ○ Read multiple SSTables ○ Slower than writes (but stil fast) ○ Seeks can be mitigated with more RAM ○ Scales to billions of rows
  21. 21. Cassandra - Gossip● Gossip protocol used to relay information between nodes in cluster● Proactive communications mechanism to share information ○ nodes proactively share what they know with random other nodes● Token space information exchanged via gossip● Failure detection based on gossip ○ heartbeat mechanism
  22. 22. Thrift API - basic calls● insert(key, column_parent, column, consistency_level) ○ key is row/keyspace identifier ○ column_parent is either column identifier ■ can be column name or super column idenfier ○ column is column data● get(key, column_path, consistency_level) ○ returns a column corresponding to the key● get_slice(key, column_parent, slice_predicate, consistency_level) ○ typically returns set of columns corresponding to key
  23. 23. Thrift API - otheroperations● get multiple rows● delete row● batch operations ○ important for speeding up system ○ can batch up mix of add, insert and delete operations● keyspace and cluster management
  24. 24. Denormalization● Cassandra requires query oriented design ○ determine queries first, design data models accordingly ○ in contrast to standard RDBMS ■ normalize data at design time ■ construct arbitrary queries usually based on joins● Quite fundamental difference in approach ○ typically results in quite different data models● Common use of valueless columns ○ column name contains data ■ good for time series data ○ can have very many columns in given row
  25. 25. Denormalization● Standard SQL ○ SELECT * FROM USER WHERE CITY = Dublin● Typically create CF which groups users by city ○ row key is city identifer ○ columns are user IDs● Can get UID of all users in given city by querying this CF ○ give city as row-key
  26. 26. Other considerations...● SuperColumnFamily ○ when it is useful?● Multi data centre deployments ○ Cassandra can leverage topology to maximize resiliency● Reaction to node failure● Reconfiguration of system ○ introduction of new nodes into existing system● It is a complex system with many working parts