Your SlideShare is downloading. ×
Evaluating Apache Cassandra as a Cloud Database
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Evaluating Apache Cassandra as a Cloud Database

2,428
views

Published on

This presentation examines how Apache Cassandra stacks up as a cloud database.

This presentation examines how Apache Cassandra stacks up as a cloud database.

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,428
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
80
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Evaluating Apache Cassandra as a Cloud Database
  • 2. Overview of DataStax Founded in April 2010 Commercial leader in Apache Cassandra™, the popular open-source “big data” database 100+ customers 30+ employees Home to Apache Cassandra Chair & most committers Headquartered in San Francisco Bay area Secured $11M in Series B funding in Sep 2011
  • 3. Why DataStax?DataStax delivers database products and servicesbased on Apache Cassandra from experts whoare at the forefront of todays data revolution.Database Software & Tools Support & Services  DataStax Enterprise  Production Support  DataStax Community  Consultative Help  DataStax OpsCenter  Professional Training  Drivers & Connectors  Online Documentation
  • 4. The Company We Keep
  • 5. What Constitutes a Cloud Database?
  • 6. What a Cloud Database is notA Cloud database is not simply taking a traditional RDBMSand running it in a Cloud provider’s environment.
  • 7. Key Attributes of a Cloud Database Transparent elasticity – can add and subtract nodes online with load balancing Transparent scalability – addition of nodes increases both (1) performance throughput; (2)ability to handle Big Data and maintain high performance High availability – always up; no single point of failure Multi-geography/zone aware – able to span multiple geographies, data centers, and cloud provider zones. Can read/write to any node Data redundancy – data is protected via multiple copies held at different physical locations Dynamic schema – able to manage structured, semi-structured, and unstructured data Simple manageability – easy to administer a logical database across many nodes Software support – supports popular public and private Cloud providers Low cost – won’t break the bank
  • 8. How does Apache Cassandra stack up?
  • 9. What is Cassandra?Apache Cassandra™ is a free Distributed… High performance… Extremely scalable… Fault tolerant (i.e. no single point of failure)…post-relational database solution. Cassandra can serveas both real-time datastore for online/transactionalapplications, and as a read-intensive database forbusiness intelligence systems.
  • 10. The History of Cassandra Bigtable Dynamo
  • 11. Cassandra Technical AdvantagesKey technical attributes of Cassandrainclude: Big Data scalability Fast /Linear scale performance No single point of failure Enterprise / multi-data center / Cloud data distribution Read/Write Anywhere capable Flexible schema Tunable data consistency Data compression Familiar SQL-Like language – CQL Easy setup No special hardware needed No special caching layer needed
  • 12. Cassandra Architecture Overview Cassandra was designed with the understanding that system/hardware failures can and do occur Peer-to-peer, distributed system All nodes the same Data partitioned among all nodes in the cluster Custom data replication to ensure fault tolerance Read/Write-anywhere design
  • 13. Cassandra Architecture Overview Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second A commit log is used on each node to capture write activity. Data durability is assured Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable)
  • 14. Cassandra Architecture Overview The schema used in Cassandra is mirrored after Google Bigtable. It is a row-oriented, column structure that can store structured, semi-structured, and unstructured data A keyspace is akin to a database in the RDBMS world A column family is similar to an RDBMS table but is more flexible/dynamic A row in a column family is indexed by its key. Other columns may be indexed as well Portfolio Keyspace Customer Column Family ID Name SSN DOB
  • 15. Transparent ElasticityNodes can be added and removed from Cassandraonline, with no downtime being experienced. 1 12 2 1 11 3 6 2 10 4 5 3 5 9 4 6 8 7
  • 16. Transparent ScalabilityAddition of Cassandra nodes increases performancelinearly and ability to manage TB’s-PB’s of data. 1 12 2 1 11 3 6 2 Performance Performance throughput = N throughput = N x 2 10 4 5 3 5 9 4 6 8 7
  • 17. Transparent Scalability Over 1 million writes/se c!http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  • 18. High AvailabilityCassandra, with its peer-to-peer architecture has nosingle point of failure.
  • 19. Multi-Geography/Zone AwareCassandra allows a single logical database to span 1-Ndatacenters that are geographically dispersed. Alsosupports a hybrid on-premise/Cloud implementation.
  • 20. Data RedundancyCassandra allows for customizable data redundancy sothat data is completely protected. Also supports rackawareness (data can be replicated between differentracks to guard against machine/rack failures).
  • 21. Dynamic SchemaCassandra’s data model – based on Google’s Bigtable –allows a user to store structured, semi-structured, andunstructured data with ease. Portfolio Keyspace Customer Column Family ID Name SSN DOB
  • 22. Simple ManageabilityAMI installers install and configure an entire multi-nodeCloud implementation in minutes. All can be managedand monitored via Web-based console.
  • 23. Cloud Provider/Software SupportCassandra is supported on popular Cloud providerplatforms and operating systems.
  • 24. Low CostCassandra is open source software and is freelyavailable. Commercial/advanced versions of Cassandraare available from DataStax along with support andother services.
  • 25. How Does Cassandra Stack Up?Cloud Database Attribute Meet? InfoTransparent elasticity Nodes can be added/removed online with auto load balancingTransparent scalability Performance increases linearly with node additions. Big Data capableHigh availability No single point of failure. Offers high degree of availabilityMulti-geo/zone Supports multi data centers, geos, Cloud zones, read-write anywhereData redundancy Customizable data replication / redundancyDynamic Schema Able to manage all key types of dataSimple manageability Easy install, setup and managed via Web consoleCloud provider/software support Support for all key providers and operating systemsLow cost Free if use community; very low cost if using DataStax for advanced functionality and/or support
  • 26. Next StepsDownload Cassandra and try it in your ownenvironment or on your Cloud provider’s platform. Go to www.datastax.com/do wnload Downloads available for both Cassandra installs that are on premise and AMI for Amazon EC2
  • 27. For More Information
  • 28. Questions?

×