NoSQL, NewSQL, and Beyond
The answer to SPRAINed relational databases

Matthew Aslett
Research Manager, Data Management and Analytics                    1




                    © 2012 by The 451 Group. All rights reserved
451 Research
 Matthew Aslett
  • Research manager, data management and analytics
  • With The 451 Group since 2007
  • www.twitter.com/maslett



Information Management                                   Commercial Adoption of Open Source
   Operational databases                                (CAOS)
   Data warehousing                                      Open source projects
   Data caching                                          Adoption of open source software
   Event processing                                      Vendor strategies




                            © 2012 by The 451 Group. All rights reserved
The 451 Group




                                                               3




                © 2012 by The 451 Group. All rights reserved
Relevant reports
 NoSQL, NewSQL and Beyond
  • Assessing the drivers behind the development and adoption
    of NoSQL and NewSQL databases, as well as data
    grid/caching technologies

  • Released April 2011

  • Role of open source in driving innovation

  • sales@the451group.com




                           © 2012 by The 451 Group. All rights reserved
NoSQL, NewSQL and Beyond
 NoSQL                                               NewSQL
  New breed of non-relational                             New breed of relational
   database products                                        database products
  Rejection of fixed table                                Retain SQL and ACID
   schema and join operations                              Designed to meet scalability
  Designed to meet scalability                             requirements of distributed
   requirements of distributed                              architectures
   architectures                                           Or improve performance so
  And/or schema-less data                                  horizontal scalability is no
   management requirements                                  longer a necessity

 … and Beyond
    In-memory data grid/cache products
    Potentialprimary platform for distributed data management


                          © 2012 by The 451 Group. All rights reserved
The NoSQL landscape
   Key Value Store                                                     Graph
  •Citrusleaf                       Document                           • InfiniteGraph
  • HandlerSocket*                   •RavenDB                          • Neo4j
  • Redis            • Riak        • MongoDB                           • DEX
  • Voldemort        •CouchBase    • CouchDB                           •OrientDB
  • Membrain         • Mongo Labs • Cloudant                           •NuvolaBase
  • Oracle NoSQL     • Mongo HQ • Iris Couch
  • Castle           •DynamoDB
  •RethinkDB         • Redis-to-go
  •LevelDB           • SimpleDB
  • Cassandra
  • DataStax EE
  •Acunu
  • HBase            • App Engine Datastore                            -as-a-Service
  • Hypertable                  Big Tables

                        © 2012 by The 451 Group. All rights reserved
The NewSQL ecosystem



-as-a-Service
                                                New databases
                                  •Drizzle               •NuoDB
• Xeround         •Drizzle        • VoltDB              •SQLFire
• Tokutek         • Akiban        • JustOne DB • Translattice
                  • GenieDB                            • Clustrix
                                               • Schooner SQL
                  •ScaleDB        •ParElastic       • ScaleBase
Storage engines   • MySQL Cluster •Continuent          •ScaleArc
                  •Zimory Scale   •Galera        • CodeFutures
                                            Clustering/sharding




                      © 2012 by The 451 Group. All rights reserved
SPRAINED RELATIONAL DATABASES




Photo credit: Foxtongue on Flickr
http://www.flickr.com/photos/foxtongue/4844016087/



                                                © 2012 by The 451 Group. All rights reserved
SPRAIN

   Scalability - Hardware economics
    Example project/service/vendor:
   • BigTable, HBase, Riak, MongoDB, Couchbase, Hadoop
   • Xeround, NuoDB
   • Data grid/cache


    Associated use case:
   • Large-scale distributed data storage
   • Analysis of continuously updated data
   • Multi-tenant PaaS data layer




                         © 2012 by The 451 Group. All rights reserved
SPRAIN

   Scalability
    Netflix:
   • 37X growth in requests Jan 2010-Jan 2011
   • “We had to get out of the datacenter business”
    Dachis Group:
   • Specifically wanted a horizontally scalable data store
    Spotify:
   • Importance (and challenges) of cross-datacenter replication
    Tellybug:
   • Peak scalability – mission critical for two hours per week. Elastic
     scalability still not solved.




                           © 2012 by The 451 Group. All rights reserved
SPRAIN

   Performance – RDBMS limitations
    Example project/service/vendor:
   • Hypertable, Couchbase, Riak, Membrain, MongoDB, Redis
   • Data grid/cache
   • VoltDB, Clustrix


    Associated use case:
   • Real time data processing of mixed read/write workloads
   • Data caching
   • Large-scale data ingestion




                         © 2012 by The 451 Group. All rights reserved
SPRAIN

   Performance
    Tellybug:
   • Having won the contract to deliver app for Britain’s Got Talent
     realised that its MySQL/Django/Python stack couldn’t deliver the
     anticipated load


    Spotify:
   • Major upgrades without service interruptions, not possible with
     sharded SQL databases


    Rackspace:
   • Ability to monitor a million different things, ability to withstand the
     failure of 1/3 data centers


                          © 2012 by The 451 Group. All rights reserved
SPRAIN

   Relaxed consistency - CAP Theorem
    Example project/service/vendor:
   • Dynamo, Voldemort, Cassandra, Riak
   • Amazon DynamoDB


    Associated use case:
   • Multi-data center replication
   • Service availability
   • Non-transactional data off-load




                            © 2012 by The 451 Group. All rights reserved
SPRAIN

   Relaxed consistency
    Netflix:
   • “We value availability over consistency. We don’t need full
     consistency.”


    Tellybug:
   • Soft production data – the ability to ignore failures


   • Spotify:
   • Flexibility to combine different consistency levels for different
     column families in a single application




                          © 2012 by The 451 Group. All rights reserved
SPRAIN

   Agility - polyglot persistence, schema-less
    Example project/service/vendor:
   • MongoDB, CouchDB, Cassandra, Riak
   • Google App Engine, SimpleDB,


    Associated use case:
   • Mobile/remote device synchronization
   • Agile development
   • Data caching




                         © 2012 by The 451 Group. All rights reserved
SPRAIN

   Agility
    Tellybug:
   • Had 4-5 weeks to ship a production system to meet contract
   • Ability to re-build entire infrastructure from scratch in 10-15 mins


    Netflix:
   • Single SQL database had to be brought down to change the schema.
   • Put the logic into the Web services and employ distributed key
     stores to enable agile development and schema changes


    Spotify:
   • Cassandra is a platform for quickly developing new applications


                          © 2012 by The 451 Group. All rights reserved
SPRAIN

   Intricacy – volume, velocity, variety
    Example project/service/vendor:
   • Neo4j, GraphDB, InfiniteGraph
   • Apache Cassandra, Hadoop, Riak
   • VoltDB, Clustrix


    Associated use case:
   • Social networking applications
   • Geo-locational applications
   • Configuration management database




                        © 2012 by The 451 Group. All rights reserved
SPRAIN

   Intricacy
    Dachis Group:
   • Combining social media data for 2,000 brands with sentiment,
     relationship, conversations, social graph
    Spotify:
   • More than half a billion playlists, about 10 thousand requests per
     second at peak
    Rackspace:
    One row per customer with thousands of columns – potentially
     millions of columns
    Tellybug:
   • Ability to cope with 10,000 interactions/sec and integrate that with
     the social graph for thousands of concurrent users


                           © 2012 by The 451 Group. All rights reserved
SPRAIN

   Necessity - open source
    The failure of existing suppliers to address emerging
     requirements

    Example projects:
   • BigTable: Google
   • Dynamo: Amazon
   • Cassandra: Facebook
   • HBase: Powerset
   • Voldemort: LinkedIn
   • Hypertable: Zvents
   • Neo4j: Windh Technologies


                          © 2012 by The 451 Group. All rights reserved
SPRAIN

   Necessity
    Spotify:
   • Created own backup and restore technology
    Rackspace:
   • Distributed secondary indexing, blob storage, many other projects
    Tellybug:
   • Created sharded counter in memcached, wrote several tools to
     figure out closeness to the ‘truth’
    Netflix:
   • Developed multiple tools including multi-datacenter replication,
     automated configuration and backup, provisioning interface etc




                         © 2012 by The 451 Group. All rights reserved
Relevant reports
 MySQL, NoSQL and NewSQL
  • Assessing the competitive dynamic between the MySQL
    ecosystem, NoSQL and NewSQL technologies

  • Due May 2012

  • Including market sizing of the three
    database segments

  • Survey of 200+ database users

  • sales@the451group.com




                           © 2012 by The 451 Group. All rights reserved
Thank you. Questions? Comments?
   matt.aslett@451research.com
             @maslett




       © 2012 by The 451 Group. All rights reserved

Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 Research

  • 1.
    NoSQL, NewSQL, andBeyond The answer to SPRAINed relational databases Matthew Aslett Research Manager, Data Management and Analytics 1 © 2012 by The 451 Group. All rights reserved
  • 2.
    451 Research  MatthewAslett • Research manager, data management and analytics • With The 451 Group since 2007 • www.twitter.com/maslett Information Management Commercial Adoption of Open Source  Operational databases (CAOS)  Data warehousing  Open source projects  Data caching  Adoption of open source software  Event processing  Vendor strategies © 2012 by The 451 Group. All rights reserved
  • 3.
    The 451 Group 3 © 2012 by The 451 Group. All rights reserved
  • 4.
    Relevant reports  NoSQL,NewSQL and Beyond • Assessing the drivers behind the development and adoption of NoSQL and NewSQL databases, as well as data grid/caching technologies • Released April 2011 • Role of open source in driving innovation • sales@the451group.com © 2012 by The 451 Group. All rights reserved
  • 5.
    NoSQL, NewSQL andBeyond NoSQL NewSQL  New breed of non-relational  New breed of relational database products database products  Rejection of fixed table  Retain SQL and ACID schema and join operations  Designed to meet scalability  Designed to meet scalability requirements of distributed requirements of distributed architectures architectures  Or improve performance so  And/or schema-less data horizontal scalability is no management requirements longer a necessity … and Beyond  In-memory data grid/cache products  Potentialprimary platform for distributed data management © 2012 by The 451 Group. All rights reserved
  • 6.
    The NoSQL landscape Key Value Store Graph •Citrusleaf Document • InfiniteGraph • HandlerSocket* •RavenDB • Neo4j • Redis • Riak • MongoDB • DEX • Voldemort •CouchBase • CouchDB •OrientDB • Membrain • Mongo Labs • Cloudant •NuvolaBase • Oracle NoSQL • Mongo HQ • Iris Couch • Castle •DynamoDB •RethinkDB • Redis-to-go •LevelDB • SimpleDB • Cassandra • DataStax EE •Acunu • HBase • App Engine Datastore -as-a-Service • Hypertable Big Tables © 2012 by The 451 Group. All rights reserved
  • 7.
    The NewSQL ecosystem -as-a-Service New databases •Drizzle •NuoDB • Xeround •Drizzle • VoltDB •SQLFire • Tokutek • Akiban • JustOne DB • Translattice • GenieDB • Clustrix • Schooner SQL •ScaleDB •ParElastic • ScaleBase Storage engines • MySQL Cluster •Continuent •ScaleArc •Zimory Scale •Galera • CodeFutures Clustering/sharding © 2012 by The 451 Group. All rights reserved
  • 8.
    SPRAINED RELATIONAL DATABASES Photocredit: Foxtongue on Flickr http://www.flickr.com/photos/foxtongue/4844016087/ © 2012 by The 451 Group. All rights reserved
  • 9.
    SPRAIN Scalability - Hardware economics  Example project/service/vendor: • BigTable, HBase, Riak, MongoDB, Couchbase, Hadoop • Xeround, NuoDB • Data grid/cache  Associated use case: • Large-scale distributed data storage • Analysis of continuously updated data • Multi-tenant PaaS data layer © 2012 by The 451 Group. All rights reserved
  • 10.
    SPRAIN Scalability  Netflix: • 37X growth in requests Jan 2010-Jan 2011 • “We had to get out of the datacenter business”  Dachis Group: • Specifically wanted a horizontally scalable data store  Spotify: • Importance (and challenges) of cross-datacenter replication  Tellybug: • Peak scalability – mission critical for two hours per week. Elastic scalability still not solved. © 2012 by The 451 Group. All rights reserved
  • 11.
    SPRAIN Performance – RDBMS limitations  Example project/service/vendor: • Hypertable, Couchbase, Riak, Membrain, MongoDB, Redis • Data grid/cache • VoltDB, Clustrix  Associated use case: • Real time data processing of mixed read/write workloads • Data caching • Large-scale data ingestion © 2012 by The 451 Group. All rights reserved
  • 12.
    SPRAIN Performance  Tellybug: • Having won the contract to deliver app for Britain’s Got Talent realised that its MySQL/Django/Python stack couldn’t deliver the anticipated load  Spotify: • Major upgrades without service interruptions, not possible with sharded SQL databases  Rackspace: • Ability to monitor a million different things, ability to withstand the failure of 1/3 data centers © 2012 by The 451 Group. All rights reserved
  • 13.
    SPRAIN Relaxed consistency - CAP Theorem  Example project/service/vendor: • Dynamo, Voldemort, Cassandra, Riak • Amazon DynamoDB  Associated use case: • Multi-data center replication • Service availability • Non-transactional data off-load © 2012 by The 451 Group. All rights reserved
  • 14.
    SPRAIN Relaxed consistency  Netflix: • “We value availability over consistency. We don’t need full consistency.”  Tellybug: • Soft production data – the ability to ignore failures • Spotify: • Flexibility to combine different consistency levels for different column families in a single application © 2012 by The 451 Group. All rights reserved
  • 15.
    SPRAIN Agility - polyglot persistence, schema-less  Example project/service/vendor: • MongoDB, CouchDB, Cassandra, Riak • Google App Engine, SimpleDB,  Associated use case: • Mobile/remote device synchronization • Agile development • Data caching © 2012 by The 451 Group. All rights reserved
  • 16.
    SPRAIN Agility  Tellybug: • Had 4-5 weeks to ship a production system to meet contract • Ability to re-build entire infrastructure from scratch in 10-15 mins  Netflix: • Single SQL database had to be brought down to change the schema. • Put the logic into the Web services and employ distributed key stores to enable agile development and schema changes  Spotify: • Cassandra is a platform for quickly developing new applications © 2012 by The 451 Group. All rights reserved
  • 17.
    SPRAIN Intricacy – volume, velocity, variety  Example project/service/vendor: • Neo4j, GraphDB, InfiniteGraph • Apache Cassandra, Hadoop, Riak • VoltDB, Clustrix  Associated use case: • Social networking applications • Geo-locational applications • Configuration management database © 2012 by The 451 Group. All rights reserved
  • 18.
    SPRAIN Intricacy  Dachis Group: • Combining social media data for 2,000 brands with sentiment, relationship, conversations, social graph  Spotify: • More than half a billion playlists, about 10 thousand requests per second at peak  Rackspace:  One row per customer with thousands of columns – potentially millions of columns  Tellybug: • Ability to cope with 10,000 interactions/sec and integrate that with the social graph for thousands of concurrent users © 2012 by The 451 Group. All rights reserved
  • 19.
    SPRAIN Necessity - open source  The failure of existing suppliers to address emerging requirements  Example projects: • BigTable: Google • Dynamo: Amazon • Cassandra: Facebook • HBase: Powerset • Voldemort: LinkedIn • Hypertable: Zvents • Neo4j: Windh Technologies © 2012 by The 451 Group. All rights reserved
  • 20.
    SPRAIN Necessity  Spotify: • Created own backup and restore technology  Rackspace: • Distributed secondary indexing, blob storage, many other projects  Tellybug: • Created sharded counter in memcached, wrote several tools to figure out closeness to the ‘truth’  Netflix: • Developed multiple tools including multi-datacenter replication, automated configuration and backup, provisioning interface etc © 2012 by The 451 Group. All rights reserved
  • 21.
    Relevant reports  MySQL,NoSQL and NewSQL • Assessing the competitive dynamic between the MySQL ecosystem, NoSQL and NewSQL technologies • Due May 2012 • Including market sizing of the three database segments • Survey of 200+ database users • sales@the451group.com © 2012 by The 451 Group. All rights reserved
  • 22.
    Thank you. Questions?Comments? matt.aslett@451research.com @maslett © 2012 by The 451 Group. All rights reserved