Cassandra

 Carbo Kuo
byvoid@byvoid.com
    2012-07-18
Why talk about Cassandra
●
    Audience Platform
●
    We use Hadoop and HBase for batch processing
●
    We need a real time database for online random
    access.
    –   HBase
    –   MySQL
    –   MongoDB
    –   Cassandra
Our needs
●
    Provide an interface which supports random
    access.
●
    Periodically synchronize data from HBase.
●
    Eliminate single point failure.
●
    Store data in multiple data centers for data
    locality and disaster recovery.
Our selections
●
    Hadoop and HBase for processing prediction
    data in batch.
●
    Cassandra for real time query.
Apache Cassandra
●
    Inspired by Amazon Dynamo and Google
    BigTable
●
    Originally developed by Facebook
●
    Apache top level project
●
    Used by Twitter, Netflix, rackspace, ...
What defines Cassandra
●
    High availability
●
    Eventually consistent
●
    Decentralized cluster
●
    BigTable-like data model
●
    High write throughput
Why we choose Cassandra
●
    It has great write performance, faster than
    read.
●
    Decentralized architecture, no single point of
    failure.
●
    Managing a Cassandra cluster is simple.
●
    Replication configurations are flexible,
    supporting cluster across multiple data
    centers.
●
    Occasionally inconsistency can be tolerated.
Cassandra vs other DBs
            Cassandra       MongoDB             HBase            MySQL
Data        BigTable-Like   Document            BigTable-Like    Table and row
Model
CAP         AP              CP                  CP               CA
Cluster     P2P             M-S Replication &   Hadoop           M-S Replication
                            build-in Sharding
Optimized   Write           Read                Batch job        Read
for
Query       By key or       Multi-indexed       By key or scan   Multi-indexed
            scan
Protocol    Thrift          Client              REST or Thrift   Client
CAP theorem
Data model
●
    Cassandra has a BigTable-like data model, but not identical

     Keyspace

      Column Family 1        Column Family 2
       Row 1                  Row 1
       Column 1   Column 2     Super Column Family
                                Super Column 1         Super Column 2
        Value 1   Value 2       Column 1    Column 2   Column 1   Column 2


       Row 2
                                  Value 1   Value 2    Value 1    Value 2
       Column 1   Column 2



        Value 1   Value 2
                             ……
Comparison with MySQL
Cassandra             MySQL
Keyspace              Database (or schema)
Column Family         Table
Row                   Row
Column                Field
Value                 Value
Cluster Communication
●
    Cassandra uses gossip protocol to discover
    location and state information about the other
    nodes in a cluster.
●
    When a node first starts up, it finds seeds node
    to listen to gossips. A node will remember
    other nodes it has gossiped.
●
    Failure state are automatically tracked during
    each heartbeat detection by gossip.
Data Partitioning
●
    Data in the cluster is represented as a ring.
●
    The ring is divided into ranges which equal to
    the number of nodes. Each node is responsible
    for one (or more) ranges of the overall data.
●
    Each node has a token. The token determines
    the node’s position on the ring.
●
    In configuration file, you can set initial token
    on each node, or generated automatically.
Data Partitioning
●
    For example, 4 node cluster, range of 0 to 100.
Multiple Datacenters
●
    In multiple datacenters deployments, it is
    ensured that each data center has a whole copy
    of data.
Partitioning with Replication
●
    The total number of replicas across the cluster
    is referred to as replication factor.




    replication_factor = 3
Snitch
●
    Snitch defines how the nodes are grouped
    together within network topology.
●
    Cassandra uses this information to route inter-
    node requests.
●
    The snitch does not affect requests between
    the client application and Cassandra. It does
    not control which node a client connects to.
Write in Cassandra
●
    Consistency level: ZERO, ANY, ONE, QUORUM,
    ALL
Read in Cassandra
●
    Consistency level: ONE, QUORUM, ALL
●
    Lazy repair (while read).
Underlying Storage
●
    Cassandra is optimized for write throughput. Writes
    are first written to a commit log, and then to
    memtable.
●
    Writes are batched in memory and periodically
    written to disk to a persistent table structure called
    an SSTable.
●
    A row may be stored across multiple SSTable files. At
    read time, a row must be combined from all SSTables
    on disk to produce the requested data.
Deletes
●
    Deleted data is not immediately removed
    from disk.
●
    Instead a marker called a tombstone is written
    to indicate the new column status.
●
    Columns marked with a tombstone exist for
    a configured time period , and then are
    permanently deleted by the compaction
    process after that time has expired.
Compaction
●
    Since SSTables are immutable, Cassandra
    periodically merges SSTables together using a
    process called compaction.
●
    Compaction merges row fragments together,
    removes expired tombstones. SSTables are
    sorted by row key, so this merge is efficient.
●
    During compaction, there is a temporary spike
    in disk space usage and disk I/O.
Limitations of Cassandra
●
    All data for a single row stores on a single
    machine in the cluster. The amount of data
    associated with a key has this upper bound.
●
    A single column value may not be larger than
    2GB.
●
    The maximum number of column within per row
    is 2000000000 (2 billion).
●
    The key and column names must be under 64K
    bytes.
Other features
●
    Secondary index
●
    Load balancer
●
    Column TTL
●
    Thrift, Avro and CQL
●
    Hadoop integration
Links
●
    http://www.datastax.com/docs/1.1/index
●
    http://wiki.apache.org/cassandra/
●
    http://www.cs.cornell.edu/projects/ladis2009/
    papers/lakshman-ladis2009.pdf
Thank you.
        郭家寶
http://www.byvoid.com/

Cassandra

  • 1.
  • 2.
    Why talk aboutCassandra ● Audience Platform ● We use Hadoop and HBase for batch processing ● We need a real time database for online random access. – HBase – MySQL – MongoDB – Cassandra
  • 3.
    Our needs ● Provide an interface which supports random access. ● Periodically synchronize data from HBase. ● Eliminate single point failure. ● Store data in multiple data centers for data locality and disaster recovery.
  • 4.
    Our selections ● Hadoop and HBase for processing prediction data in batch. ● Cassandra for real time query.
  • 5.
    Apache Cassandra ● Inspired by Amazon Dynamo and Google BigTable ● Originally developed by Facebook ● Apache top level project ● Used by Twitter, Netflix, rackspace, ...
  • 6.
    What defines Cassandra ● High availability ● Eventually consistent ● Decentralized cluster ● BigTable-like data model ● High write throughput
  • 7.
    Why we chooseCassandra ● It has great write performance, faster than read. ● Decentralized architecture, no single point of failure. ● Managing a Cassandra cluster is simple. ● Replication configurations are flexible, supporting cluster across multiple data centers. ● Occasionally inconsistency can be tolerated.
  • 8.
    Cassandra vs otherDBs Cassandra MongoDB HBase MySQL Data BigTable-Like Document BigTable-Like Table and row Model CAP AP CP CP CA Cluster P2P M-S Replication & Hadoop M-S Replication build-in Sharding Optimized Write Read Batch job Read for Query By key or Multi-indexed By key or scan Multi-indexed scan Protocol Thrift Client REST or Thrift Client
  • 9.
  • 10.
    Data model ● Cassandra has a BigTable-like data model, but not identical Keyspace Column Family 1 Column Family 2 Row 1 Row 1 Column 1 Column 2 Super Column Family Super Column 1 Super Column 2 Value 1 Value 2 Column 1 Column 2 Column 1 Column 2 Row 2 Value 1 Value 2 Value 1 Value 2 Column 1 Column 2 Value 1 Value 2 ……
  • 11.
    Comparison with MySQL Cassandra MySQL Keyspace Database (or schema) Column Family Table Row Row Column Field Value Value
  • 12.
    Cluster Communication ● Cassandra uses gossip protocol to discover location and state information about the other nodes in a cluster. ● When a node first starts up, it finds seeds node to listen to gossips. A node will remember other nodes it has gossiped. ● Failure state are automatically tracked during each heartbeat detection by gossip.
  • 13.
    Data Partitioning ● Data in the cluster is represented as a ring. ● The ring is divided into ranges which equal to the number of nodes. Each node is responsible for one (or more) ranges of the overall data. ● Each node has a token. The token determines the node’s position on the ring. ● In configuration file, you can set initial token on each node, or generated automatically.
  • 14.
    Data Partitioning ● For example, 4 node cluster, range of 0 to 100.
  • 15.
    Multiple Datacenters ● In multiple datacenters deployments, it is ensured that each data center has a whole copy of data.
  • 16.
    Partitioning with Replication ● The total number of replicas across the cluster is referred to as replication factor. replication_factor = 3
  • 17.
    Snitch ● Snitch defines how the nodes are grouped together within network topology. ● Cassandra uses this information to route inter- node requests. ● The snitch does not affect requests between the client application and Cassandra. It does not control which node a client connects to.
  • 18.
    Write in Cassandra ● Consistency level: ZERO, ANY, ONE, QUORUM, ALL
  • 19.
    Read in Cassandra ● Consistency level: ONE, QUORUM, ALL ● Lazy repair (while read).
  • 20.
    Underlying Storage ● Cassandra is optimized for write throughput. Writes are first written to a commit log, and then to memtable. ● Writes are batched in memory and periodically written to disk to a persistent table structure called an SSTable. ● A row may be stored across multiple SSTable files. At read time, a row must be combined from all SSTables on disk to produce the requested data.
  • 21.
    Deletes ● Deleted data is not immediately removed from disk. ● Instead a marker called a tombstone is written to indicate the new column status. ● Columns marked with a tombstone exist for a configured time period , and then are permanently deleted by the compaction process after that time has expired.
  • 22.
    Compaction ● Since SSTables are immutable, Cassandra periodically merges SSTables together using a process called compaction. ● Compaction merges row fragments together, removes expired tombstones. SSTables are sorted by row key, so this merge is efficient. ● During compaction, there is a temporary spike in disk space usage and disk I/O.
  • 23.
    Limitations of Cassandra ● All data for a single row stores on a single machine in the cluster. The amount of data associated with a key has this upper bound. ● A single column value may not be larger than 2GB. ● The maximum number of column within per row is 2000000000 (2 billion). ● The key and column names must be under 64K bytes.
  • 24.
    Other features ● Secondary index ● Load balancer ● Column TTL ● Thrift, Avro and CQL ● Hadoop integration
  • 25.
    Links ● http://www.datastax.com/docs/1.1/index ● http://wiki.apache.org/cassandra/ ● http://www.cs.cornell.edu/projects/ladis2009/ papers/lakshman-ladis2009.pdf
  • 26.
    Thank you. 郭家寶 http://www.byvoid.com/