Evaluating Apache
   Cassandra as a
  Cloud Database
Overview of DataStax
 Founded  in April 2010
 Commercial leader in Apache Cassandra™,
  the popular open-source “big data” database
 100+ customers
 30+ employees
 Home to Apache Cassandra Chair & most
  committers
 Headquartered in San Francisco Bay area
 Secured $11M in Series B funding in Sep 2011
Why DataStax?
DataStax delivers database products and services
based on Apache Cassandra from experts who
are at the forefront of today's data revolution.


Database Software & Tools       Support & Services

    DataStax Enterprise       Production Support
    DataStax Community        Consultative Help
    DataStax OpsCenter        Professional Training
    Drivers & Connectors      Online Documentation
The Company We Keep
What Constitutes a
 Cloud Database?
What a Cloud Database is not
A Cloud database is not simply taking a traditional RDBMS
and running it in a Cloud provider’s environment.
Key Attributes of a Cloud Database
   Transparent elasticity – can add and subtract nodes online with load
    balancing
   Transparent scalability – addition of nodes increases both (1)
    performance throughput; (2)ability to handle Big Data and maintain
    high performance
   High availability – always up; no single point of failure
   Multi-geography/zone aware – able to span multiple geographies, data
    centers, and cloud provider zones. Can read/write to any node
   Data redundancy – data is protected via multiple copies held at
    different physical locations
   Dynamic schema – able to manage structured, semi-structured, and
    unstructured data
   Simple manageability – easy to administer a logical database across
    many nodes
   Software support – supports popular public and private Cloud providers
   Low cost – won’t break the bank
How does Apache
  Cassandra stack
              up?
What is Cassandra?
Apache Cassandra™ is a free
   Distributed…
   High performance…
   Extremely scalable…
   Fault tolerant (i.e. no single point of failure)…

post-relational database solution. Cassandra can serve
as both real-time datastore for online/transactional
applications, and as a read-intensive database for
business intelligence systems.
The History of Cassandra
     Bigtable              Dynamo
Cassandra Technical Advantages
Key technical attributes of Cassandra
include:
   Big Data scalability
   Fast /Linear scale performance
   No single point of failure
   Enterprise / multi-data center / Cloud data distribution
   Read/Write Anywhere capable
   Flexible schema
   Tunable data consistency
   Data compression
   Familiar SQL-Like language – CQL
   Easy setup
   No special hardware needed
   No special caching layer needed
Cassandra Architecture Overview
   Cassandra was designed with the understanding that
    system/hardware failures can and do occur
   Peer-to-peer, distributed system
   All nodes the same
   Data partitioned among all nodes in the cluster
   Custom data replication to ensure fault tolerance
   Read/Write-anywhere design
Cassandra Architecture Overview
   Each node communicates with each other through the
    Gossip protocol, which exchanges information across the
    cluster every second
   A commit log is used on each node to capture write
    activity. Data durability is assured
   Data also written to an in-memory structure (memtable)
    and then to disk once the memory structure is full (an
    SStable)
Cassandra Architecture Overview
   The schema used in Cassandra is mirrored after Google
    Bigtable. It is a row-oriented, column structure that can
    store structured, semi-structured, and unstructured data
   A keyspace is akin to a database in the RDBMS world
   A column family is similar to an RDBMS table but is more
    flexible/dynamic
   A row in a column family is indexed by its key. Other
    columns may be indexed as well

                               Portfolio Keyspace
                                   Customer Column Family

                                    ID    Name      SSN   DOB
Transparent Elasticity
Nodes can be added and removed from Cassandra
online, with no downtime being experienced.


                                            1
                                       12           2



       1
                                  11                        3

  6            2




                         10
                                                                4




  5            3

                                                            5
                              9
           4



                                                        6
                                       8

                                                7
Transparent Scalability
Addition of Cassandra nodes increases performance
linearly and ability to manage TB’s-PB’s of data.


                                                 1
                                         12              2



           1
                                    11                            3

  6                    2



      Performance                        Performance
      throughput = N                     throughput = N x 2
                           10
                                                                      4




  5                    3

                                                                  5
                                9
               4



                                                              6
                                         8

                                                     7
Transparent Scalability


    Over 1
    million
   writes/se
       c!




http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
High Availability
Cassandra, with its peer-to-peer architecture has no
single point of failure.
Multi-Geography/Zone Aware
Cassandra allows a single logical database to span 1-N
datacenters that are geographically dispersed. Also
supports a hybrid on-premise/Cloud implementation.
Data Redundancy
Cassandra allows for customizable data redundancy so
that data is completely protected. Also supports rack
awareness (data can be replicated between different
racks to guard against machine/rack failures).
Dynamic Schema
Cassandra’s data model – based on Google’s Bigtable –
allows a user to store structured, semi-structured, and
unstructured data with ease.




                         Portfolio Keyspace
                             Customer Column Family

                              ID    Name      SSN   DOB
Simple Manageability
AMI installers install and configure an entire multi-node
Cloud implementation in minutes. All can be managed
and monitored via Web-based console.
Cloud Provider/Software Support
Cassandra is supported on popular Cloud provider
platforms and operating systems.
Low Cost
Cassandra is open source software and is freely
available. Commercial/advanced versions of Cassandra
are available from DataStax along with support and
other services.
How Does Cassandra Stack Up?
Cloud Database Attribute          Meet?                    Info
Transparent elasticity                    Nodes can be added/removed
                                          online with auto load balancing
Transparent scalability                   Performance increases linearly with
                                          node additions. Big Data capable
High availability                         No single point of failure. Offers high
                                          degree of availability
Multi-geo/zone                            Supports multi data centers, geos,
                                          Cloud zones, read-write anywhere
Data redundancy                           Customizable data replication /
                                          redundancy
Dynamic Schema                            Able to manage all key types of
                                          data
Simple manageability                      Easy install, setup and managed via
                                          Web console
Cloud provider/software support           Support for all key providers and
                                          operating systems
Low cost                                  Free if use community; very low cost
                                          if using DataStax for advanced
                                          functionality and/or support
Next Steps
Download Cassandra and try it in your own
environment or on your Cloud provider’s platform.

 Go to
  www.datastax.com/do
  wnload
 Downloads available for
  both Cassandra installs
  that are on premise and
  AMI for Amazon EC2
For More Information
Questions?

Evaluating Apache Cassandra as a Cloud Database

  • 1.
    Evaluating Apache Cassandra as a Cloud Database
  • 2.
    Overview of DataStax Founded in April 2010  Commercial leader in Apache Cassandra™, the popular open-source “big data” database  100+ customers  30+ employees  Home to Apache Cassandra Chair & most committers  Headquartered in San Francisco Bay area  Secured $11M in Series B funding in Sep 2011
  • 3.
    Why DataStax? DataStax deliversdatabase products and services based on Apache Cassandra from experts who are at the forefront of today's data revolution. Database Software & Tools Support & Services  DataStax Enterprise  Production Support  DataStax Community  Consultative Help  DataStax OpsCenter  Professional Training  Drivers & Connectors  Online Documentation
  • 4.
  • 5.
    What Constitutes a Cloud Database?
  • 6.
    What a CloudDatabase is not A Cloud database is not simply taking a traditional RDBMS and running it in a Cloud provider’s environment.
  • 7.
    Key Attributes ofa Cloud Database  Transparent elasticity – can add and subtract nodes online with load balancing  Transparent scalability – addition of nodes increases both (1) performance throughput; (2)ability to handle Big Data and maintain high performance  High availability – always up; no single point of failure  Multi-geography/zone aware – able to span multiple geographies, data centers, and cloud provider zones. Can read/write to any node  Data redundancy – data is protected via multiple copies held at different physical locations  Dynamic schema – able to manage structured, semi-structured, and unstructured data  Simple manageability – easy to administer a logical database across many nodes  Software support – supports popular public and private Cloud providers  Low cost – won’t break the bank
  • 8.
    How does Apache Cassandra stack up?
  • 9.
    What is Cassandra? ApacheCassandra™ is a free  Distributed…  High performance…  Extremely scalable…  Fault tolerant (i.e. no single point of failure)… post-relational database solution. Cassandra can serve as both real-time datastore for online/transactional applications, and as a read-intensive database for business intelligence systems.
  • 10.
    The History ofCassandra Bigtable Dynamo
  • 11.
    Cassandra Technical Advantages Keytechnical attributes of Cassandra include:  Big Data scalability  Fast /Linear scale performance  No single point of failure  Enterprise / multi-data center / Cloud data distribution  Read/Write Anywhere capable  Flexible schema  Tunable data consistency  Data compression  Familiar SQL-Like language – CQL  Easy setup  No special hardware needed  No special caching layer needed
  • 12.
    Cassandra Architecture Overview  Cassandra was designed with the understanding that system/hardware failures can and do occur  Peer-to-peer, distributed system  All nodes the same  Data partitioned among all nodes in the cluster  Custom data replication to ensure fault tolerance  Read/Write-anywhere design
  • 13.
    Cassandra Architecture Overview  Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second  A commit log is used on each node to capture write activity. Data durability is assured  Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable)
  • 14.
    Cassandra Architecture Overview  The schema used in Cassandra is mirrored after Google Bigtable. It is a row-oriented, column structure that can store structured, semi-structured, and unstructured data  A keyspace is akin to a database in the RDBMS world  A column family is similar to an RDBMS table but is more flexible/dynamic  A row in a column family is indexed by its key. Other columns may be indexed as well Portfolio Keyspace Customer Column Family ID Name SSN DOB
  • 15.
    Transparent Elasticity Nodes canbe added and removed from Cassandra online, with no downtime being experienced. 1 12 2 1 11 3 6 2 10 4 5 3 5 9 4 6 8 7
  • 16.
    Transparent Scalability Addition ofCassandra nodes increases performance linearly and ability to manage TB’s-PB’s of data. 1 12 2 1 11 3 6 2 Performance Performance throughput = N throughput = N x 2 10 4 5 3 5 9 4 6 8 7
  • 17.
    Transparent Scalability Over 1 million writes/se c! http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
  • 18.
    High Availability Cassandra, withits peer-to-peer architecture has no single point of failure.
  • 19.
    Multi-Geography/Zone Aware Cassandra allowsa single logical database to span 1-N datacenters that are geographically dispersed. Also supports a hybrid on-premise/Cloud implementation.
  • 20.
    Data Redundancy Cassandra allowsfor customizable data redundancy so that data is completely protected. Also supports rack awareness (data can be replicated between different racks to guard against machine/rack failures).
  • 21.
    Dynamic Schema Cassandra’s datamodel – based on Google’s Bigtable – allows a user to store structured, semi-structured, and unstructured data with ease. Portfolio Keyspace Customer Column Family ID Name SSN DOB
  • 22.
    Simple Manageability AMI installersinstall and configure an entire multi-node Cloud implementation in minutes. All can be managed and monitored via Web-based console.
  • 23.
    Cloud Provider/Software Support Cassandrais supported on popular Cloud provider platforms and operating systems.
  • 24.
    Low Cost Cassandra isopen source software and is freely available. Commercial/advanced versions of Cassandra are available from DataStax along with support and other services.
  • 25.
    How Does CassandraStack Up? Cloud Database Attribute Meet? Info Transparent elasticity Nodes can be added/removed online with auto load balancing Transparent scalability Performance increases linearly with node additions. Big Data capable High availability No single point of failure. Offers high degree of availability Multi-geo/zone Supports multi data centers, geos, Cloud zones, read-write anywhere Data redundancy Customizable data replication / redundancy Dynamic Schema Able to manage all key types of data Simple manageability Easy install, setup and managed via Web console Cloud provider/software support Support for all key providers and operating systems Low cost Free if use community; very low cost if using DataStax for advanced functionality and/or support
  • 26.
    Next Steps Download Cassandraand try it in your own environment or on your Cloud provider’s platform.  Go to www.datastax.com/do wnload  Downloads available for both Cassandra installs that are on premise and AMI for Amazon EC2
  • 27.
  • 28.