Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassnadra vs HBase


Published on

Cassandra Moscow, April 2013 meetup

Published in: Technology

Cassnadra vs HBase

  1. 1. Cassandra vs HBaseSimilarities and differences in thearchitectural approaches
  2. 2. Foundation papers● The Google File System; Sanjay Ghemawat, HowardGobioff, and Shun-Tak Leung● Bigtable: A Distributed Storage System forStructured Data; Fay Chang, Jeffrey Dean, Sanjay Ghemawat,Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra,Andrew Fikes, Robert E. Gruber● Dynamo: Amazon’s Highly Available Key-value Store; Giuseppe DeCandia, Deniz Hastorun, MadanJampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels
  3. 3. Agenda● Storage: LSM trees● Data distribution in cluster● Fault-tolerance
  4. 4. Log-structured merge tree layout
  5. 5. Log-structured merge tree● Writes are aggregated in memory and thenflushed to disk in one batch○ Memtable is actually a write-behind cache○ Write-ahead log (disk commit log) is used to protectin-memory data from node failures● In-memory entries are asynchronouslypersisted as a single segment (file) ofrecords sorted by key○ The segments are asynchronously merged togetherin order to get log(number of records) segments
  6. 6. Why LSM tree is good for HBase● LSM tree suites well for HDFS○ LSM tree writes data in large batches○ SSTables are immutable
  7. 7. LSM tree problems● Relatively slow read○ The requested key can be in any segment hence allof them should be checked■ Key cache (Cassandra)■ Bloom filters can be used to ignore some of thefiles● They are prone to false-positives● Early versions of HDFS had no support foran append operation○ Append is required for write ahead log○ hflush in HDFS 0.21 allows to flush written datawithout closing the file
  8. 8. Agenda● Storage: LSM trees● Data distribution in cluster● Fault-tolerance
  9. 9. Shared nothing architecture● Each node processes requests for its ownshard of data● It is always known which node is responsiblefor the particular key
  10. 10. Cassandra entry distribution
  11. 11. Cassandra distributed storage● Consistent Hashing (node ring) is used todistribute column family around the clusternodes● A node is responsible for storing key rangewhich hashes less than its own number(token)○ Node tokens are set explicitly in config
  12. 12. Virtual nodes● Virtual nodes are available for Cassandrafrom v1.2● No need for manual token assignment○ Make data distribute evenly across the physicalnodes○ It is simpler to set which proportion of data is storedon the particular node
  13. 13. Cassandra partition strategies● Random partitioner○ Node is determined by the key MD5 hash● Byte-ordered partitioner○ Node is determined by the number constructed fromfirst bytes of the key○ Allows range queries○ Prone to uneven data distribution
  14. 14. Cassandra secondary indexesplacement
  15. 15. HBase region distribution
  16. 16. HBase distributed storage● Region meta table○ Continuous range of keys is a region○ Root table stores regions for meta table itself○ Master try to evenly distribute regions acrossRegionServers■ Regions can be moved between region serversin order to achieve better distribution● Since actual data is in HDFS no data is moved during theprocess● Secondary attribute queries○ DIY indexes: coprocessors
  17. 17. Region splits/merges● Initially only one region is allocated for atable● Uneven region sizes● Online region splitting○ Data is not copied, new regions files just hold linkson the data in old regions files● Region merging is still unstable
  18. 18. Agenda● Storage: LSM trees● Data distribution in cluster● Fault-tolerance
  19. 19. HBase cluster nodes
  20. 20. HDFS and CAP theorem● CP○ HDFS replicates data synchronously on write○ DataNode is considered dead if is not visible for theNameNode■ Lost block replicas will be restored automaticallyon live nodes○ DataNode stops serving requests if the NameNodeis lost
  21. 21. HDFS block replication in cluster
  22. 22. HDFS block replication● HDFS tends to store one copy of a block onthe same server with client○ if there is a DataNode on the same server● HDFS Rack Awareness○ one copy on the client server○ one on the same rack○ one on different rack
  23. 23. HDFS disadvantages (if used asstorage for HBase)● HDFS requires an additional request toNameNode in order to find a DataNodestoring required block● Data should be transferred from theDataNode to RegionServer on reads insome cases○ HBase is not taking region file blocks locations toaccount when it assigns regions to RegionSevers
  24. 24. HBase inter-cluster replication● Master-slave inter-cluster synchronousreplication● Request to region server is replicated toslave HBase cluster
  25. 25. Cassandra and CAP theorem● AP○ Gossip style failure detection■ Failed node is still in the ring● New replica for a data range will be assigned only if failednode is manually removed○ Async write■ A node will replicate the write to appropriatenodes but return to client immediately● Can also be "eventually" consistent○ Quorum write■ Blocks until certain number of writes is reached■ But there is no distributed commit protocol○ Quorum reads
  26. 26. Lack of distributed commit protocolissue1. Client writes to all replicas2. Write on one of the replicas is failed3. Write operation is failed4. All of the replicas except one persisted thefailed write
  27. 27. Inconsistent write repair measures● Read repairs○ Difference in results will be detected on read frommultiple replicas● Hinted handoff○ Failed write is remembered and be retried by thecoordinator node● Anti-Entropy○ Manually started replica reconciliation
  28. 28. Cassandra simple replication
  29. 29. Cassandra network topology basedreplication
  30. 30. Cassandra replica placementstrategies● Simple○ The closest neighbor down the ring is selected as areplica● Network topology based○ Additional replicas are placed by walking the ringclockwise until a node in a different rack is found■ If no such node exists, additional replicas areplaced in different nodes in the same rack○ Server - DC:Rack mappings are set explicitly inconfig
  31. 31. Links● File appends in HDFS:● HBase file locality in HDFS:● HBase Coprocessors:● HBase Region Splitting: