Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra

195 views

Published on

Cassandra

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cassandra

  1. 1. Cassandra diegopacheco @diego_pacheco Diego Pacheco
  2. 2. http://cassandra.apache.org/
  3. 3. Why Apache Cassandra? ❏ Open Source ❏ Written in Java ❏ Scalability & High Availability ❏ Fault Tolerance ❏ Replication across multiple datacenters ❏ Async Masterless Replication ❏ No Single Point of Failure ❏ Based on Amazon Dynamo paper ❏ Created by Facebook, open sourced to apache in 2008
  4. 4. Battle Tested by http://planetcassandra.org/apache-cassandra-use-cases/
  5. 5. Benchmark: 1 million writes per second http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  6. 6. CAP: Consistency VS Availability
  7. 7. Cluster
  8. 8. Murmur3Partitioner ❏ Murmur3Partitioner ❏ Default ❏ 3-5x faster than RandomPartitioner ❏ Based on Tokens hash values ❏ Uniform ❏ RandomPartitioner ❏ Uniform ❏ MD5 hash ❏ ByteOrderedPartitioner ❏ Lexically ordered by key bytes ❏ Ordered partition ❏ Not Recommended: ❏ Difficult LB, Hot Spots, Uneven LB multiple tables.
  9. 9. Replication ❏ Concepts: ❏ Virtual Nodes: Data ownership to machines ❏ Partitioner: Partitions data on the cluster ❏ Replication Strategy: Determine Replicas for each row of data ❏ Snitch: Topology, information about replicas and strategy. ❏ Client writes to any node ❏ Node coordinates with replicas ❏ Replication happens in parallel ❏ Replication Factor = How many nodes with same data? I.E. 3. ❏ SimpleStrategy VS NetworkTopologyStrategy ❏ Design: Nodes, Racks, Data Centers great for Cloud Computing!
  10. 10. Replication
  11. 11. Consistency ❏ Tunable Consistency: Reads and Writes: ❏ Consistency VS Availability Trade Offs: ❏ ONE, TWO, THREE ❏ QUORUM(majority = N /2 + 1) - LOCAL_QUORUM(majority local dc) ❏ EACH_QUORUM (majority all dcs) ❏ LOCAL_ONE ❏ ALL ❏ ANY (Just for writes) ❏ Disaster Recovery scenarios: ❏ SERIAL ❏ LOCAL_SERIAL
  12. 12. Reads and Index ❏ Partition key Cache ❏ Off Heap ❏ Configurable ❏ Row Cache ❏ Off Heap ❏ Configurable ❏ Secondary Index ❏ Filter data on table by non-primary key ❏ ALLOW FILTERING - Could be problematic ❏ Cassandra 3.4 - SASI Secondary Index ❏ Better Performance ❏ In memory mapped B+ tree ❏ Can't use with collections
  13. 13. Storage ❏ Log-Structured Merge Tree(don't use B-TREE) ❏ Avoid Read before write ❏ Flavors Latency ❏ Cass Groups Insert/Updates in memory. Periodically SYNC to disk(sequential append). ❏ Immutable Data ❏ Check before write? Use Lightweight Transactions. ❏ Writes: Commit Log -> Memtable -> Flush -> Disk SSTABLE ❏ All writes are versioned ❏ No Delete: Tombstones ❏ Reads: Bloom filter ❏ Off Heap structure for SSTable
  14. 14. Java Driver ❏ Specific to Cassandra ❏ Prepared Statements ❏ Connection Pooling ❏ Reconnection Policies ❏ Load Balancing Policies ❏ Retry Policies ❏ Async Netty ❏ Native Protocol
  15. 15. Cassandra diegopacheco @diego_pacheco Diego Pacheco

×