Evaluating Apache Cassandra as a Cloud Database


Published on

This presentation examines how Apache Cassandra stacks up as a cloud database.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Evaluating Apache Cassandra as a Cloud Database

  1. 1. Evaluating Apache Cassandra as a Cloud Database
  2. 2. Overview of DataStax Founded in April 2010 Commercial leader in Apache Cassandra™, the popular open-source “big data” database 100+ customers 30+ employees Home to Apache Cassandra Chair & most committers Headquartered in San Francisco Bay area Secured $11M in Series B funding in Sep 2011
  3. 3. Why DataStax?DataStax delivers database products and servicesbased on Apache Cassandra from experts whoare at the forefront of todays data revolution.Database Software & Tools Support & Services  DataStax Enterprise  Production Support  DataStax Community  Consultative Help  DataStax OpsCenter  Professional Training  Drivers & Connectors  Online Documentation
  4. 4. The Company We Keep
  5. 5. What Constitutes a Cloud Database?
  6. 6. What a Cloud Database is notA Cloud database is not simply taking a traditional RDBMSand running it in a Cloud provider’s environment.
  7. 7. Key Attributes of a Cloud Database Transparent elasticity – can add and subtract nodes online with load balancing Transparent scalability – addition of nodes increases both (1) performance throughput; (2)ability to handle Big Data and maintain high performance High availability – always up; no single point of failure Multi-geography/zone aware – able to span multiple geographies, data centers, and cloud provider zones. Can read/write to any node Data redundancy – data is protected via multiple copies held at different physical locations Dynamic schema – able to manage structured, semi-structured, and unstructured data Simple manageability – easy to administer a logical database across many nodes Software support – supports popular public and private Cloud providers Low cost – won’t break the bank
  8. 8. How does Apache Cassandra stack up?
  9. 9. What is Cassandra?Apache Cassandra™ is a free Distributed… High performance… Extremely scalable… Fault tolerant (i.e. no single point of failure)…post-relational database solution. Cassandra can serveas both real-time datastore for online/transactionalapplications, and as a read-intensive database forbusiness intelligence systems.
  10. 10. The History of Cassandra Bigtable Dynamo
  11. 11. Cassandra Technical AdvantagesKey technical attributes of Cassandrainclude: Big Data scalability Fast /Linear scale performance No single point of failure Enterprise / multi-data center / Cloud data distribution Read/Write Anywhere capable Flexible schema Tunable data consistency Data compression Familiar SQL-Like language – CQL Easy setup No special hardware needed No special caching layer needed
  12. 12. Cassandra Architecture Overview Cassandra was designed with the understanding that system/hardware failures can and do occur Peer-to-peer, distributed system All nodes the same Data partitioned among all nodes in the cluster Custom data replication to ensure fault tolerance Read/Write-anywhere design
  13. 13. Cassandra Architecture Overview Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second A commit log is used on each node to capture write activity. Data durability is assured Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable)
  14. 14. Cassandra Architecture Overview The schema used in Cassandra is mirrored after Google Bigtable. It is a row-oriented, column structure that can store structured, semi-structured, and unstructured data A keyspace is akin to a database in the RDBMS world A column family is similar to an RDBMS table but is more flexible/dynamic A row in a column family is indexed by its key. Other columns may be indexed as well Portfolio Keyspace Customer Column Family ID Name SSN DOB
  15. 15. Transparent ElasticityNodes can be added and removed from Cassandraonline, with no downtime being experienced. 1 12 2 1 11 3 6 2 10 4 5 3 5 9 4 6 8 7
  16. 16. Transparent ScalabilityAddition of Cassandra nodes increases performancelinearly and ability to manage TB’s-PB’s of data. 1 12 2 1 11 3 6 2 Performance Performance throughput = N throughput = N x 2 10 4 5 3 5 9 4 6 8 7
  17. 17. Transparent Scalability Over 1 million writes/se c!http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  18. 18. High AvailabilityCassandra, with its peer-to-peer architecture has nosingle point of failure.
  19. 19. Multi-Geography/Zone AwareCassandra allows a single logical database to span 1-Ndatacenters that are geographically dispersed. Alsosupports a hybrid on-premise/Cloud implementation.
  20. 20. Data RedundancyCassandra allows for customizable data redundancy sothat data is completely protected. Also supports rackawareness (data can be replicated between differentracks to guard against machine/rack failures).
  21. 21. Dynamic SchemaCassandra’s data model – based on Google’s Bigtable –allows a user to store structured, semi-structured, andunstructured data with ease. Portfolio Keyspace Customer Column Family ID Name SSN DOB
  22. 22. Simple ManageabilityAMI installers install and configure an entire multi-nodeCloud implementation in minutes. All can be managedand monitored via Web-based console.
  23. 23. Cloud Provider/Software SupportCassandra is supported on popular Cloud providerplatforms and operating systems.
  24. 24. Low CostCassandra is open source software and is freelyavailable. Commercial/advanced versions of Cassandraare available from DataStax along with support andother services.
  25. 25. How Does Cassandra Stack Up?Cloud Database Attribute Meet? InfoTransparent elasticity Nodes can be added/removed online with auto load balancingTransparent scalability Performance increases linearly with node additions. Big Data capableHigh availability No single point of failure. Offers high degree of availabilityMulti-geo/zone Supports multi data centers, geos, Cloud zones, read-write anywhereData redundancy Customizable data replication / redundancyDynamic Schema Able to manage all key types of dataSimple manageability Easy install, setup and managed via Web consoleCloud provider/software support Support for all key providers and operating systemsLow cost Free if use community; very low cost if using DataStax for advanced functionality and/or support
  26. 26. Next StepsDownload Cassandra and try it in your ownenvironment or on your Cloud provider’s platform. Go to www.datastax.com/do wnload Downloads available for both Cassandra installs that are on premise and AMI for Amazon EC2
  27. 27. For More Information
  28. 28. Questions?