Evaluating Apache Cassandra as a Cloud Database
Overview of DataStax Founded in April 2010 Commercial leader in Apache Cassandra™, the popular open-source “big data” database 100+ customers 30+ employees Home to Apache Cassandra Chair & most committers Headquartered in San Francisco Bay area Secured $11M in Series B funding in Sep 2011
Why DataStax?DataStax delivers database products and servicesbased on Apache Cassandra from experts whoare at the forefront of todays data revolution.Database Software & Tools Support & Services DataStax Enterprise Production Support DataStax Community Consultative Help DataStax OpsCenter Professional Training Drivers & Connectors Online Documentation
What a Cloud Database is notA Cloud database is not simply taking a traditional RDBMSand running it in a Cloud provider’s environment.
Key Attributes of a Cloud Database Transparent elasticity – can add and subtract nodes online with load balancing Transparent scalability – addition of nodes increases both (1) performance throughput; (2)ability to handle Big Data and maintain high performance High availability – always up; no single point of failure Multi-geography/zone aware – able to span multiple geographies, data centers, and cloud provider zones. Can read/write to any node Data redundancy – data is protected via multiple copies held at different physical locations Dynamic schema – able to manage structured, semi-structured, and unstructured data Simple manageability – easy to administer a logical database across many nodes Software support – supports popular public and private Cloud providers Low cost – won’t break the bank
What is Cassandra?Apache Cassandra™ is a free Distributed… High performance… Extremely scalable… Fault tolerant (i.e. no single point of failure)…post-relational database solution. Cassandra can serveas both real-time datastore for online/transactionalapplications, and as a read-intensive database forbusiness intelligence systems.
Cassandra Technical AdvantagesKey technical attributes of Cassandrainclude: Big Data scalability Fast /Linear scale performance No single point of failure Enterprise / multi-data center / Cloud data distribution Read/Write Anywhere capable Flexible schema Tunable data consistency Data compression Familiar SQL-Like language – CQL Easy setup No special hardware needed No special caching layer needed
Cassandra Architecture Overview Cassandra was designed with the understanding that system/hardware failures can and do occur Peer-to-peer, distributed system All nodes the same Data partitioned among all nodes in the cluster Custom data replication to ensure fault tolerance Read/Write-anywhere design
Cassandra Architecture Overview Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second A commit log is used on each node to capture write activity. Data durability is assured Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable)
Cassandra Architecture Overview The schema used in Cassandra is mirrored after Google Bigtable. It is a row-oriented, column structure that can store structured, semi-structured, and unstructured data A keyspace is akin to a database in the RDBMS world A column family is similar to an RDBMS table but is more flexible/dynamic A row in a column family is indexed by its key. Other columns may be indexed as well Portfolio Keyspace Customer Column Family ID Name SSN DOB
Transparent ElasticityNodes can be added and removed from Cassandraonline, with no downtime being experienced. 1 12 2 1 11 3 6 2 10 4 5 3 5 9 4 6 8 7
Transparent ScalabilityAddition of Cassandra nodes increases performancelinearly and ability to manage TB’s-PB’s of data. 1 12 2 1 11 3 6 2 Performance Performance throughput = N throughput = N x 2 10 4 5 3 5 9 4 6 8 7
Transparent Scalability Over 1 million writes/se c!http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
High AvailabilityCassandra, with its peer-to-peer architecture has nosingle point of failure.
Multi-Geography/Zone AwareCassandra allows a single logical database to span 1-Ndatacenters that are geographically dispersed. Alsosupports a hybrid on-premise/Cloud implementation.
Data RedundancyCassandra allows for customizable data redundancy sothat data is completely protected. Also supports rackawareness (data can be replicated between differentracks to guard against machine/rack failures).
Dynamic SchemaCassandra’s data model – based on Google’s Bigtable –allows a user to store structured, semi-structured, andunstructured data with ease. Portfolio Keyspace Customer Column Family ID Name SSN DOB
Simple ManageabilityAMI installers install and configure an entire multi-nodeCloud implementation in minutes. All can be managedand monitored via Web-based console.
Cloud Provider/Software SupportCassandra is supported on popular Cloud providerplatforms and operating systems.
Low CostCassandra is open source software and is freelyavailable. Commercial/advanced versions of Cassandraare available from DataStax along with support andother services.
How Does Cassandra Stack Up?Cloud Database Attribute Meet? InfoTransparent elasticity Nodes can be added/removed online with auto load balancingTransparent scalability Performance increases linearly with node additions. Big Data capableHigh availability No single point of failure. Offers high degree of availabilityMulti-geo/zone Supports multi data centers, geos, Cloud zones, read-write anywhereData redundancy Customizable data replication / redundancyDynamic Schema Able to manage all key types of dataSimple manageability Easy install, setup and managed via Web consoleCloud provider/software support Support for all key providers and operating systemsLow cost Free if use community; very low cost if using DataStax for advanced functionality and/or support
Next StepsDownload Cassandra and try it in your ownenvironment or on your Cloud provider’s platform. Go to www.datastax.com/do wnload Downloads available for both Cassandra installs that are on premise and AMI for Amazon EC2