What is Cassandra?<br /><ul><li>Database system
 Uses Amazon’s Dyanamo’s fully distributed design.
 Uses Google’s BigTable’sColumnFamily-based data model.
 Developed by facebook
 Open sourced in 2008</li></li></ul><li>Why Cassandra?<br /><ul><li>Proven
 Fault Tolerant
 Decentralized
 Eventually Consistent
 Rich Data Model
 Elastic
 Highly Available</li></li></ul><li>Why not Cassandra?<br /><ul><li>Thrift
 Ugly
 No streaming
 Disc vs CPU Tradeoff
Denormalized data
 Heavy write</li></li></ul><li>Proven<br />
Fault Tolerant<br /><ul><li> Data is automatically replicated to multiple nodes for fault-tolerance.
 Replication across multiple data centers is supported.
 Failed nodes can be replaced with no downtime. </li></li></ul><li>Decentralized<br /><ul><li>Every node in the cluster is...
 There are no network bottlenecks.
Upcoming SlideShare
Loading in …5
×

Cassandra

2,714 views

Published on

Presentation on Cassandra

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
No Downloads
Views
Total views
2,714
On SlideShare
0
From Embeds
0
Number of Embeds
44
Actions
Shares
0
Downloads
0
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide
  • Disc space is cheap and scaling MySQL is hard.
  • Cassandra

    1. 1.
    2. 2. What is Cassandra?<br /><ul><li>Database system
    3. 3. Uses Amazon’s Dyanamo’s fully distributed design.
    4. 4. Uses Google’s BigTable’sColumnFamily-based data model.
    5. 5. Developed by facebook
    6. 6. Open sourced in 2008</li></li></ul><li>Why Cassandra?<br /><ul><li>Proven
    7. 7. Fault Tolerant
    8. 8. Decentralized
    9. 9. Eventually Consistent
    10. 10. Rich Data Model
    11. 11. Elastic
    12. 12. Highly Available</li></li></ul><li>Why not Cassandra?<br /><ul><li>Thrift
    13. 13. Ugly
    14. 14. No streaming
    15. 15. Disc vs CPU Tradeoff
    16. 16. Denormalized data
    17. 17. Heavy write</li></li></ul><li>Proven<br />
    18. 18. Fault Tolerant<br /><ul><li> Data is automatically replicated to multiple nodes for fault-tolerance.
    19. 19. Replication across multiple data centers is supported.
    20. 20. Failed nodes can be replaced with no downtime. </li></li></ul><li>Decentralized<br /><ul><li>Every node in the cluster is identical.
    21. 21. There are no network bottlenecks.
    22. 22. There are no single points of failure.
    23. 23. Can use ordered partitioners for sorted data. </li></li></ul><li>Eventually Consistent<br /><ul><li>Uses BASE (Basically Available Soft-state Eventual) Consistency.
    24. 24. As the data is replicated, the latest version of something is sitting on at least one node in the cluster, but older versions can still be on other nodes.
    25. 25. Eventually all nodes will see the latest version.</li></li></ul><li>Rich Data Model<br /><ul><li>Keyspace
    26. 26. Column Family
    27. 27. Rows
    28. 28. Column
    29. 29. Super Column</li></li></ul><li>Keyspace<br /><ul><li> A keyspace is the first dimension of the Cassandra hash, and is the container for column families.
    30. 30. Keyspaces are of roughly the same granularity as a schema or database (i.e. a logical collection of tables) in the RDBMS world. </li></li></ul><li>Column Family<br /><ul><li> A column family is a container for columns, analogous to the table in a relational system.
    31. 31. A column family holds an ordered list of columns, which you can reference by the column name. </li></li></ul><li>Rows<br /><ul><li> Each column family is stored in a separate file, and the file is sorted in row (i.e. key) major order.
    32. 32. A row can have a virtually unlimited number of columns.
    33. 33. The key is what determines what machine data is stored on. Keys should be well distributed for order preserving partitioners. </li></li></ul><li>Columns<br /><ul><li> The column is the lowest/smallest increment of data. It's a tuple that contains a name, a value and a timestamp.
    34. 34. All values are supplied by the client, including the 'timestamp'. This means that clocks on the clients should be synchronized, as these timestamps are used for conflict resolution.</li></li></ul><li>Super Columns<br /><ul><li> Super columns are columns whose values are another set of columns.
    35. 35. A column that is part of a super column can not be a super column itself.
    36. 36. There is no index on super column values.</li></li></ul><li>Elastic<br /><ul><li>Read and write throughput both increase linearly as new machines are added.
    37. 37. No downtime or interruption to applications. </li></li></ul><li>Highly Available<br /><ul><li>Writes and reads offer a tunable ConsistencyLevel.
    38. 38. Guarantee reads and write ConsistencyLevels, from one node to all nodes.
    39. 39. A Quorum ConsistencyLevel is available.</li></li></ul><li>Write ConsistencyLevels<br />
    40. 40. Read ConsistencyLevels<br />
    41. 41. Simple Benchmark<br /><ul><li>Simple database that just logs IPs and dates
    42. 42. Start with empty database
    43. 43. Insert 1 year of data with 100 records per day.
    44. 44. 36,500 records</li></li></ul><li>Simple Benchmark - Results<br /><ul><li>MySQL
    45. 45. 3.68 ms / insert
    46. 46. Random Write
    47. 47. Cassandra
    48. 48. 1.27 ms / insert
    49. 49. Sequential Write</li></li></ul><li>

    ×